1st. Ph.D. student at Fudan University & Shanghai AI Lab. My current research interests includes: Computer-Use, AI Agents, Reinforcement Learning, and Diffusion Large Language Models. Homepage · Google Scholar · [email protected]
-
WildClawBench: Hard, practical, end-to-end evaluation for AI agents — in the wild.
Project · Code · -
DARE: Diffusion Large Language Models Alignment and Reinforcement Executor.
Code · -
[NeurIPS 2025] RiOSWorld: Benchmarking the risk of multimodal computer-use agents.
Project · Code · -
[ICLR 2026] Your agent may misevolve: Emergent risks in self-evolving llm agents.
Code ·
