Kai Liu

Zhejiang University

Hangzhou, 38 Zheda Road.

Email: kail@zju.edu.cn

I am a Ph.D. student at Zhejiang University, under the supervision of Prof. Fan Zhou and Prof. Yaowu Chen from Sept. 2020 to Dec. 2025. I used to serve as a research intern at Apsara Lab, Alibaba Cloud from May, 2022 to Sept, 2024, under the supervision of Prof. Jieping Ye. I also used to visit NExT++ research center at National University of Singapore as a joint Ph.D. student (supported by the CSC program) from Sept, 2024 to Apr, 2025, under the supervision of Prof. Tat-Seng Chua and Dr. Hao Fei.

My research interests lie in multimodal large language models, video/audio generation, unified understanding and generation, etc. Here is my Google Scholar. I’m always positive with academic and business collaboration. If you are interested to chat with me, feel free to drop me an email.

Listed below are the accepted papers in top conferences and journals where I worked as the first author. Here are the full lists of publications and the repositories will come soon. I look forward to continuing to make valuable contributions to the multimodal community.

news

Mar 5, 2026	Three papers are accepted by ICLR’26! Code and checkpoints are released!
Feb 3, 2026	We build the JavisVerse project for Joint Audio-Video Intelligence Symphony!
Sep 18, 2025	One paper is accepted by NeurIPS’25 as spotlight! Code, model, and data are coming soon!
May 15, 2025	One paper is accepted by ACL’25 main! Code and model are released!
Apr 5, 2025	A cool joint audio-video generation model is released! Feel free to have a try!

selected publications

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Kai Liu, Jungang Li, Yuchong Sun, Shengqiong Wu, Jianzhang Gao, Daoan Zhang, Wei Zhang, Sheng Jin, Sicheng Yu, Geng Zhan, Jiayi Ji, Fan Zhou, Liang Zheng, Shuicheng YAN, Hao Fei, and Tat-Seng Chua

In Conference on Neural Information Processing Systems [Spotlight], Nov 2025

Bib PDF Code

@inproceedings{liu2025javisgpt,
  title = {JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation},
  author = {Liu, Kai and Li, Jungang and Sun, Yuchong and Wu, Shengqiong and Gao, Jianzhang and Zhang, Daoan and Zhang, Wei and Jin, Sheng and Yu, Sicheng and Zhan, Geng and Ji, Jiayi and Zhou, Fan and Zheng, Liang and YAN, Shuicheng and Fei, Hao and Chua, Tat-Seng},
  booktitle = {Conference on Neural Information Processing Systems [Spotlight]},
  month = nov,
  year = {2025},
}

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

Kai Liu, Yanhao Zheng, Kai Wang, Shengqiong Wu, Rongjunchen Zhang, Jiebo Luo, Dimitrios Hatzinakos, Ziwei Liu, Hao Fei, and Tat-Seng Chua

In The Fourteenth International Conference on Learning Representations, Apr 2026

Bib PDF Code

@inproceedings{liu2025javisdit++,
  title = {JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation},
  author = {Liu, Kai and Zheng, Yanhao and Wang, Kai and Wu, Shengqiong and Zhang, Rongjunchen and Luo, Jiebo and Hatzinakos, Dimitrios and Liu, Ziwei and Fei, Hao and Chua, Tat-Seng},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  month = apr,
  year = {2026},
}

Javisdit: Joint audio-video diffusion transformer with hierarchical spatio-temporal prior synchronization

Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, and Tat-Seng Chua

In The Fourteenth International Conference on Learning Representations, Apr 2026

Bib PDF Code

@inproceedings{liu2025javisdit,
  title = {Javisdit: Joint audio-video diffusion transformer with hierarchical spatio-temporal prior synchronization},
  author = {Liu, Kai and Li, Wei and Chen, Lai and Wu, Shengqiong and Zheng, Yanhao and Ji, Jiayi and Zhou, Fan and Jiang, Rongxin and Luo, Jiebo and Fei, Hao and Chua, Tat-Seng},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  month = apr,
  year = {2026},
}

Structure-aware Domain Knowledge Injection for Large Language Models

Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, and Jieping Ye

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Jul 2025

Bib PDF Code

@inproceedings{liu2025structure,
  title = {Structure-aware Domain Knowledge Injection for Large Language Models},
  author = {Liu, Kai and Chen, Ze and Fu, Zhihang and Jiang, Rongxin and Zhou, Fan and Chen, Yaowu and Wu, Yue and Ye, Jieping},
  booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics},
  month = jul,
  year = {2025},
}

Enhancing LLM’s Cognition via Structurization

Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, and Jieping Ye

In Conference on Neural Information Processing Systems, Nov 2024

Bib PDF Code

@inproceedings{liu2024enhancing,
  title = {Enhancing LLM's Cognition via Structurization},
  author = {Liu, Kai and Fu, Zhihang and Chen, Chao and Zhang, Wei and Jiang, Rongxin and Zhou, Fan and Chen, Yaowu and Wu, Yue and Ye, Jieping},
  booktitle = {Conference on Neural Information Processing Systems},
  month = nov,
  year = {2024},
}

INSIDE: LLMs’ Internal States Retain the Power of Hallucination Detection

Chao Chen, Kai Liu, Ze Chen, Yi Gu, Mingyuan Tao, Zhihang Fu, and Jieping Ye

In International Conference on Learning Representations, May 2024

Bib PDF Code

@inproceedings{chen2024inside,
  title = {INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection},
  author = {Chen, Chao and Liu, Kai and Chen, Ze and Gu, Yi and Tao, Mingyuan and Fu, Zhihang and Ye, Jieping},
  booktitle = {International Conference on Learning Representations},
  month = may,
  year = {2024},
}