Kai Liu

profile_image.jpg

Zhejiang University

Hangzhou, 38 Zheda Road.

Email: kail@zju.edu.cn

I am a Ph.D. student at Zhejiang University, under the supervision of Prof. Fan Zhou and Prof. Yaowu Chen from 2020 to present. I used to serve as a research intern at Apsara Lab, Alibaba Cloud from May, 2022 to Sept, 2024, under the supervision of Prof. Jieping Ye. I also used to visit NExT++ research center at National University of Singapore as a joint Ph.D. student (supported by the CSC program) from Sept, 2024 to Apr, 2025, under the supervision of Prof. Tat-Seng Chua and Dr. Hao Fei.

My research interests lie in multimodal large language models, video/audio generation, unified understanding and generation, etc. Here is my Google Scholar. I’m seeking a job opportunity in the 2026 job market. If you are interested to chat with me, feel free to drop me an email.

Listed below are the accepted papers in top conferences and journals where I worked as the first author. Here are the full lists of publications and the repositories will come soon. I look forward to continuing to make valuable contributions to the multimodal community.

news

Sep 18, 2025 One paper is accepted by NeurIPS’25 as spotlight! Code, model, and data are coming soon!
May 15, 2025 One paper is accepted by ACL’25 main! Code and model are released!
Apr 5, 2025 A cool joint audio-video generation model is released! Feel free to have a try!
Nov 6, 2024 One paper is accepted by IEEE TIP! Code and model are released!
Oct 31, 2024 Three papers are accepted by NeurIPS’24! Code and model are released!

selected publications

  1. nips25_javisgpt.png
    JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
    Kai Liu, Jungang Li, Yuchong Sun, Shengqiong Wu, Jianzhang Gao, Daoan Zhang, Wei Zhang, Sheng Jin, Sicheng Yu, Geng Zhan, Jiayi Ji, Fan Zhou, Liang Zheng, Shuicheng YAN, Hao Fei, and Tat-Seng Chua
    In Conference on Neural Information Processing Systems [Spotlight], Nov 2025
  2. arxiv_javisdit.png
    Javisdit: Joint audio-video diffusion transformer with hierarchical spatio-temporal prior synchronization
    Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, and Tat-Seng Chua
    arXiv preprint arXiv:2503.23377, Mar 2025
  3. acl25_struct.jpg
    Structure-aware Domain Knowledge Injection for Large Language Models
    Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, and Jieping Ye
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Jul 2025
  4. nips24_struxgpt.jpg
    Enhancing LLM’s Cognition via Structurization
    Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, and Jieping Ye
    In Conference on Neural Information Processing Systems, Nov 2024
  5. iclr24_inside.png
    INSIDE: LLMs’ Internal States Retain the Power of Hallucination Detection
    Chao Chen, Kai Liu, Ze Chen, Yi Gu, Mingyuan Tao, Zhihang Fu, and Jieping Ye
    In International Conference on Learning Representations, May 2024