📝 Publications

A full publication list is available on my Google Scholar page.

(*: Equal contribution; †: Corresponding authors.)

🎬 Video Generation, World Model & Multimodal Model

arXiv 2026

[arXiv 2026 Kling Team] OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
Jiwen Liu, Shuo Li, Zhang Fang, Xiang Li, Yitong Zhou, Zijie Meng, et al.

We propose OmniDirector, a general framework for multi-shot camera cloning that operates without the need for cross-paired data, significantly advancing camera control in video synthesis.

arXiv 2026

[arXiv 2026] ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation
Zijie Meng, Jiwen Liu, Yu Liu, Chen Tong, Xiao Liu, Yingya Zhang, Yong Xu, Pengfei Wan.
[Code] (Internal/Coming soon)

We propose ARGUS, a novel framework for subject-preserving video generation using stacked multi-view identity mosaic injection, ensuring high fidelity and temporal consistency.
This work was conducted during my internship at Kuaishou Kling, focusing on controllable identity injection in foundation video models.

ICASSP 2026

[ICASSP 2026 oral] Make a Game: A Novel Paradigm for Interactive Game Rendering
Zijie Meng, Jian Che, Bo Wei, Xuesong Cao.
[Award] First Prize in PKU Challenge Cup

We introduce a novel paradigm for interactive game rendering using unified tokens and lightweight plugins, enhancing controllability in video generation.
Successfully generalized to complex interactive game scenarios, providing a bridge between generative AI and real-time game engines.

NeurIPS 2026

We introduce 3D-RAD, the most comprehensive 3D radiology dataset for Medical VQA, supporting multi-temporal analysis and diverse clinical diagnostic tasks.

SCIS (CCF-A)

[Science China Info. Sci. 2025] Orpaint: A Zero-Shot Inpainting Model for Oracle Bone Inscription Rubbings with Visual Mamba Block
Zijie Meng, Yuer Zeng, Xiang Chang, Tianyang Xu, Fei Chao, Xuesong Cao, Chun Chen, Qiang Shen.
[Journal] JCR-Q1, CCF-A

We propose Orpaint, the first zero-shot inpainting model specifically designed for Oracle Bone Inscription (甲骨文) restoration.
By integrating the Visual Mamba Block into the Diffusion denoising network, we achieve significantly faster inference and better structural restoration for damaged ancient rubbings.

ACM MM 2025

We address the challenging task of sand-dust image restoration by leveraging uncertainty-aware SAM (Segment Anything Model) priors and prompt learning.
My contribution focused on the Llama3 fine-tuning for generating refined perceptual instructions.

ICME 2026

We propose a multi-scale two-stream vision-language alignment framework that decouples semantic understanding from distortion perception for robust AI-generated image quality assessment.
My contribution focused on the overall framework design and vision-language alignment strategy.

MICCAI 2025

[MICCAI 2025] SynPo: Boosting Training-Free Few-Shot Medical Segmentation via High-Quality Negative Prompts
Y Liu, H Xiao, J Chai, Y Zhang, R Wang, Zijie Meng, Z Luo.
CCF-B Conference / Medical AI Top Conference

We propose SynPo, which boosts training-free medical image segmentation by utilizing high-quality negative prompts to refine few-shot boundary detection.