ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.06282
  4. Cited By
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

10 January 2025
Qian Chen
Yafeng Chen
Yanni Chen
Mengzhe Chen
Yuxiao Chen
Chong Deng
Zhihao Du
Ruize Gao
Changfeng Gao
Zhifu Gao
Yabin Li
Xiang Lv
Jiaqing Liu
Haoneng Luo
B. Ma
Chongjia Ni
Xian Shi
Jialong Tang
Hui Wang
Hao Wang
Wen Wang
Yansen Wang
Yunlan Xu
Fan Yu
Zhijie Yan
Yexin Yang
Baosong Yang
Xian Yang
Guanrou Yang
Tianyu Zhao
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Pei Zhang
Chuxu Zhang
Jinren Zhou
    AuLLMMLLM
ArXiv (abs)PDFHTML

Papers citing "MinMo: A Multimodal Large Language Model for Seamless Voice Interaction"

19 / 19 papers shown
Title
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yukun Ma
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
AuLLM
84
0
0
11 Jun 2025
Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
Sathvik Udupa
Shinji Watanabe
Petr Schwarz
Jan ''Honza'' Cernocký
18
0
0
08 Jun 2025
Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Haibin Wu
Yuxuan Hu
Ruchao Fan
Xiaofei Wang
K. Kumatani
...
J. Yu
Heng Lu
Lijuan Wang
Y. Qian
Jinyu Li
AuLLM
60
0
0
04 Jun 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
167
1
0
23 May 2025
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
78
0
0
21 May 2025
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
134
0
0
21 May 2025
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
85
0
0
17 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLMALM
125
0
0
14 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
80
2
0
06 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
101
4
0
05 May 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
185
13
0
25 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
158
14
0
11 Apr 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
Yuhao Wang
Heyang Liu
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
458
3
0
05 Apr 2025
Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models
Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models
Heeseung Kim
Che Hyun Lee
Sangkwon Park
Jiheum Yeom
Nohil Park
Sangwon Yu
Sungroh Yoon
130
1
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLMVLM
220
4
0
26 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
244
2
0
26 Feb 2025
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Borui Liao
Yulong Xu
Jiao Ou
Kaiyuan Yang
Weihua Jian
Pengfei Wan
Di Zhang
AuLLM
145
0
0
19 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLMSyDaVLM
170
1
0
18 Feb 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Ailin Huang
Boyong Wu
Bruce Wang
Chao Yan
Chen Hu
...
Tianyu Wang
Wenjin Deng
Wuxun Xie
Weipeng Ming
Wenqing He
AuLLM
128
17
0
17 Feb 2025
1