🤓 Greetings!

I am Zijie Meng (孟子杰), a second-year Master’s student at Peking University (PKU). Currently, I am a Research Intern at Kuaishou Kling Team, working with [Jiwen Liu, and Pengfei Wan on the next generation of video foundation models.

I obtained my Double Bachelor’s Degrees in Artificial Intelligence and Finance (from the Wang Yanan Institute for Studies in Economics, WISE) from Xiamen University in 2024. My research journey in Generative AI began at the MAC Lab, supervised by Prof. Rongrong Ji. Since then, I have been fortunate to hone my skills through research internships at ByteDance (Seed Team), 360 AI Research, and Shanghai AI Lab (OpenMMLab).

I am incredibly honored to be the recipient of the Peking University May Fourth Scholarship (the highest individual honor at PKU) and the Xiamen University Golden Kapok Medal (one of the highest honor at XMU, Top 0.01% school-wide).

🚀 Research Vision: Towards the World Model

Since I first encountered Diffusion Models in 2023, I have been captivated by their immense generative power. I believe that generative capability is the ultimate goal for achieving AGI. In my philosophy, achieving “controllable generation at will” is the definitive path to constructing a true World Model. To perfectly control the generation of a world, the model must inherently possess a profound and comprehensive understanding of that world’s physics, semantics, and dynamics.

My representative projects include OmniDirector (Developed on Kling Omni, Camera Control), Kling 3.0 (Subject-ID & Motion Control), ARGUS(ID Control), Orpaint (Visual Mamba-based Inpainting), and Make-a-Game (Game Video Generation).

🔍 Research Interests

Currently, I am focusing on building the next generation of vision intelligence:

1️⃣ AIGC & Video Generation: Focusing on Controllable Video Synthesis, Multi-modal Generation, and integrating 3D Spatial Priors into generative paradigms to build robust World Models.
2️⃣ Agentic World Models: Exploring how AI Agents can assist in constructing and navigating simulated environments.
3️⃣ Vision-Language Models (VLM): Enhancing Multi-modal Alignment, instruction-following, and human-AI interaction capabilities.

📧 I am always open to academic collaborations or discussions regarding Video Foundations and AGI. If you’re interested in my work or seeking research synergy, please feel free to reach me at ymlf@stu.pku.edu.cn.