Lunjun Zhang

I am a CS PhD candidate in the Machine Learning Group at University of Toronto, advised by Jimmy Ba.

In the past, I spent time at Google DeepMind working on LLM reasoning and eval, and at Waabi studying under Raquel Urtasun.

I studied Engineering Science at University of Toronto.

Lunjun Zhang

Research

I am broadly interested in building general-purpose agents in the digital and physical worlds, with a focus on recursive self improvement.

I love thinking about self-improvement mechanisms from various perspectives, from reward modeling to world modeling to RL to automated AI research.

Selected Publications

Evolutionary System Prompt Learning for Reinforcement Learning in LLMs

Lunjun Zhang, Ryan Chen, Bradly C. Stadie

arXiv 2026

Combining RL and Evolutionary Algorithms for more effective LLM Self-Improvement.

EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL

Lunjun Zhang, Jimmy Ba

arXiv 2026

Top-k KL flexibly interpolates between exact and sampled KL, while remaining unbiased at any k.

Generative Verifiers: Reward Modeling as Next-Token Prediction

Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

International Conference on Learning Representations (ICLR), 2025

Reward models are better with next token prediction and chain of thoughts, too.

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, Raquel Urtasun

International Conference on Learning Representations (ICLR), 2024

A foundation world model for self-driving that explicitly reasons in both 3D space and time.

Towards Unsupervised Object Detection from LiDAR Point Clouds

Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, Raquel Urtasun

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Self-supervised, scalable object discovery in the wild.

World Model as a Graph: Learning Latent Landmarks for Planning

Lunjun Zhang, Ge Yang, Bradly C. Stadie

International Conference on Machine Learning (ICML), 2021 (Long Talk)

Unsupervised long-horizon planning via graph-structured world models.