Kai Liu
Zhejiang University
Hangzhou, 38 Zheda Road.
Email: kail@zju.edu.cn
I am a Ph.D. student at Zhejiang University, under the supervision of Prof. Fan Zhou and Prof. Yaowu Chen from Sept. 2020 to Dec. 2025. I used to serve as a research intern at Apsara Lab, Alibaba Cloud from May, 2022 to Sept, 2024, under the supervision of Prof. Jieping Ye. I also used to visit NExT++ research center at National University of Singapore as a joint Ph.D. student (supported by the CSC program) from Sept, 2024 to Apr, 2025, under the supervision of Prof. Tat-Seng Chua and Dr. Hao Fei.
My research interests lie in multimodal large language models, video/audio generation, unified understanding and generation, etc. Here is my Google Scholar. I’m always positive with academic and business collaboration. If you are interested to chat with me, feel free to drop me an email.
Listed below are the accepted papers in top conferences and journals where I worked as the first author. Here are the full lists of publications and the repositories will come soon. I look forward to continuing to make valuable contributions to the multimodal community.
news
| Mar 5, 2026 | Three papers are accepted by ICLR’26! Code and checkpoints are released! |
|---|---|
| Feb 3, 2026 | We build the JavisVerse project for Joint Audio-Video Intelligence Symphony! |
| Sep 18, 2025 | One paper is accepted by NeurIPS’25 as spotlight! Code, model, and data are coming soon! |
| May 15, 2025 | One paper is accepted by ACL’25 main! Code and model are released! |
| Apr 5, 2025 | A cool joint audio-video generation model is released! Feel free to have a try! |