ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.18425
89
0

Kimi-Audio Technical Report

25 April 2025
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
S. Liu
T. Liu
Zeyu Shang
Kai Shen
Wei Song
Xu Tan
H. Tang
Z. Wang
Chu Wei
Yifei Xin
Xinran Xu
Jianwei Yu
Y. Zhang
Xinyu Zhou
Y. Charles
J. Chen
Y. Chen
Yulun Du
Weiran He
Zhenxing Hu
Guokun Lai
Qingcheng Li
Y. Liu
Weidong Sun
J. Wang
Y. Wang
Y. Wu
Yuxin Wu
Dongchao Yang
Hao Yang
Y. Yang
Z. Yang
Aoxiong Yin
Ruibin Yuan
Y. Zhang
Zaida Zhou
    AuLLM
    VLM
ArXivPDFHTML
Abstract

We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input and discrete tokens as output, and develop a chunk-wise streaming detokenizer based on flow matching. We curate a pre-training dataset that consists of more than 13 million hours of audio data covering a wide range of modalities including speech, sound, and music, and build a pipeline to construct high-quality and diverse post-training data. Initialized from a pre-trained LLM, Kimi-Audio is continual pre-trained on both audio and text data with several carefully designed tasks, and then fine-tuned to support a diverse of audio-related tasks. Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation. We release the codes, model checkpoints, as well as the evaluation toolkits inthis https URL.

View on arXiv
@article{kimiteam2025_2504.18425,
  title={ Kimi-Audio Technical Report },
  author={ KimiTeam and Ding Ding and Zeqian Ju and Yichong Leng and Songxiang Liu and Tong Liu and Zeyu Shang and Kai Shen and Wei Song and Xu Tan and Heyi Tang and Zhengtao Wang and Chu Wei and Yifei Xin and Xinran Xu and Jianwei Yu and Yutao Zhang and Xinyu Zhou and Y. Charles and Jun Chen and Yanru Chen and Yulun Du and Weiran He and Zhenxing Hu and Guokun Lai and Qingcheng Li and Yangyang Liu and Weidong Sun and Jianzhou Wang and Yuzhi Wang and Yuefeng Wu and Yuxin Wu and Dongchao Yang and Hao Yang and Ying Yang and Zhilin Yang and Aoxiong Yin and Ruibin Yuan and Yutong Zhang and Zaida Zhou },
  journal={arXiv preprint arXiv:2504.18425},
  year={ 2025 }
}
Comments on this paper