Evolutionary System Prompt Learning for Reinforcement Learning in LLMs
arXiv 2026
Combining RL and Evolutionary Algorithms for more effective LLM Self-Improvement.
I am a CS PhD candidate in the Machine Learning Group at University of Toronto, advised by Jimmy Ba.
In the past, I spent time at Google DeepMind working on LLM reasoning and eval, and at Waabi studying under Raquel Urtasun.
I studied Engineering Science at University of Toronto.
I am broadly interested in building general-purpose agents in the digital and physical worlds, with a focus on recursive self improvement.
I love thinking about self-improvement mechanisms from various perspectives, from reward modeling to world modeling to RL to automated AI research.
arXiv 2026
Combining RL and Evolutionary Algorithms for more effective LLM Self-Improvement.
arXiv 2026
Top-k KL flexibly interpolates between exact and sampled KL, while remaining unbiased at any k.
International Conference on Learning Representations (ICLR), 2025
Reward models are better with next token prediction and chain of thoughts, too.
International Conference on Learning Representations (ICLR), 2024
A foundation world model for self-driving that explicitly reasons in both 3D space and time.
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Self-supervised, scalable object discovery in the wild.
International Conference on Machine Learning (ICML), 2021 (Long Talk)
Unsupervised long-horizon planning via graph-structured world models.