Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.06666
Cited By
v1
v2 (latest)
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
International Conference on Learning Representations (ICLR), 2024
10 September 2024
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (58 upvotes)
Papers citing
"LLaMA-Omni: Seamless Speech Interaction with Large Language Models"
50 / 86 papers shown
Title
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
Boyu Zhu
Xiaofei Wen
Wenjie Mo
Tinghui Zhu
Yanan Xie
Peng Qi
Muhao Chen
48
0
0
02 Dec 2025
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin
Yuxin Cao
Junjie Su
Minhui Xue
Jie Hao
Ke Xu
Jin Song Dong
Derui Wang
AAML
102
0
0
30 Oct 2025
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Zhuoran Jin
Hongbang Yuan
Kejian Zhu
Jiachun Li
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
113
0
0
27 Oct 2025
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Wenming Tu
Guanrou Yang
Ruiqi Yan
Wenxi Chen
Ziyang Ma
Yipeng Kang
Kai Yu
Xie Chen
Zilong Zheng
104
0
0
26 Oct 2025
Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding
Xin Zhang
Lin Li
Xiangni Lu
Jianquan Liu
Kong Aik Lee
80
0
0
23 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELM
AuLLM
149
0
0
19 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
228
4
0
15 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
400
4
0
15 Oct 2025
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey
Abdulhady Abas Abdullah
Arkaitz Zubiaga
Seyedali Mirjalili
Amir Gandomi
Fatemeh Daneshfar
Mohammadsadra Amini
Alan Salam Mohammed
Hadi Veisi
ALM
156
0
0
14 Oct 2025
Latent Speech-Text Transformer
Yen-Ju Lu
Yashesh Gaur
Wei Zhou
Benjamin Muller
Jesus Villalba
...
Luke Zettlemoyer
Gargi Ghosh
Mike Lewis
Srinivasan Iyer
Duc Le
VLM
100
0
0
07 Oct 2025
Local MAP Sampling for Diffusion Models
Shaorong Zhang
Rob Brekelmans
Greg Ver Steeg
108
1
0
07 Oct 2025
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
Umberto Cappellazzo
Minsu Kim
Pingchuan Ma
Honglie Chen
Xubo Liu
Stavros Petridis
Maja Pantic
MoE
140
0
0
05 Oct 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Chengyao Wang
Zhisheng Zhong
Bohao Peng
Senqiao Yang
Yuqi Liu
Haokun Gui
Bin Xia
Jingyao Li
Bei Yu
Jiaya Jia
MLLM
AuLLM
VLM
151
1
0
29 Sep 2025
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
Chao Wang
Rui Zheng
Yang Ai
Zhen-Hua Ling
72
0
0
28 Sep 2025
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
Y. Wu
Tianrui Wang
Yizhou Peng
Yi-Wen Chao
Xuyi Zhuang
Xinsheng Wang
Shunshun Yin
Ziyang Ma
112
0
0
27 Sep 2025
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Yuhan Song
Linhao Zhang
Chuhan Wu
Aiwei Liu
Wei Jia
Houfeng Wang
Xiao-bin Zhou
121
0
0
26 Sep 2025
KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
So Kuroki
Yotaro Kubo
Takuya Akiba
Yujin Tang
RALM
AuLLM
84
0
0
26 Sep 2025
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
Zhen Xiong
Yujun Cai
Zhecheng Li
Junsong Yuan
Yiwei Wang
AuLLM
LRM
235
1
0
26 Sep 2025
Acoustic-based Gender Differentiation in Speech-aware Language Models
Junhyuk Choi
Jihwan Seol
Nayeon Kim
Chanhee Cho
EunBin Cho
Bugeun Kim
AuLLM
136
1
0
25 Sep 2025
Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
Shree Harsha Bokkahalli Satish
G. Henter
Éva Székely
124
1
0
24 Sep 2025
Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
Weijie Wu
Wenhao Guan
Kaidi Wang
Peijie Chen
Zhuanling Zha
Junbo Li
Jun Fang
Lin Li
Q. Hong
VLM
181
0
0
24 Sep 2025
FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations
Junjie Chen
Yao Hu
Junjie Li
K. Li
Kun Liu
...
Manzhen Wei
Yichen Wu
Fenglong Xie
K. Xu
Kun Xie
175
3
0
08 Sep 2025
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
Haoyu Wang
Guangyan Zhang
Jiale Chen
Jingyu Li
Yuehai Wang
Yiwen Guo
AuLLM
163
0
0
26 Aug 2025
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
Jingwen Liu
Kan Jen Cheng
Jiachen Lian
Akshay Anand
Rishi Jain
...
Robin Netzorg
Huang-Cheng Chou
Tingle Li
Guan-Ting Lin
Gopala Krishna Anumanchipalli
70
3
0
25 Aug 2025
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
Yuancheng Wang
Dekun Chen
Xueyao Zhang
Junan Zhang
Jiaqi Li
Zhizheng Wu
216
4
0
22 Aug 2025
Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models
Zhifei Xie
Ziyang Ma
Zihang Liu
Kaiyu Pang
Hongyu Li
J. Zhang
Yue Liao
Deheng Ye
Chunyan Miao
Shuicheng Yan
AuLLM
LRM
232
7
0
18 Aug 2025
Dual Information Speech Language Models for Emotional Conversations
Chun Wang
Chenyang Liu
Wenze Xu
Weihong Deng
AuLLM
72
0
0
11 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLM
AI4TS
VLM
330
12
0
06 Aug 2025
Training-Free Multimodal Large Language Model Orchestration
Tianyu Xie
Yuhang Wu
Yongdong Luo
Jinfa Huang
Xiawu Zheng
120
0
0
06 Aug 2025
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents
C. Jiang
Jiajun Sun
Yifei Cao
Jiabao Zhuang
Hui Li
Xiaoran Fan
Ming-bo Wen
Junjie Ye
Jiajun Sun
251
0
0
04 Aug 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
481
10
0
27 Jul 2025
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
Design Automation Conference (DAC), 2025
Linye Wei
Shuzhang Zhong
Songqiang Xu
Runsheng Wang
Ru Huang
Meng Li
198
0
0
24 Jul 2025
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Hongjie Chen
Zehan Li
Yaodong Song
Wenming Deng
Yitong Yao
...
Chao Wang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AuLLM
VLM
211
2
0
24 Jul 2025
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
Qibing Bai
Sho Inoue
Shuai Wang
Zhongjie Jiang
Yannan Wang
Haizhou Li
122
1
0
23 Jul 2025
Step-Audio 2 Technical Report
Boyong Wu
Chao Yan
Chen Hu
Cheng Yi
Chengli Feng
...
Yuanwei Lu
Yuchu Luo
Yuhe Yin
Yumeng Zhan
Y. Zhang
AuLLM
231
0
0
22 Jul 2025
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
S. Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
ReLM
LRM
104
10
0
21 Jul 2025
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
Shoutao Guo
Shaolei Zhang
Qingkai Fang
Zhengrui Ma
Min Zhang
Yang Feng
AuLLM
206
1
0
20 Jul 2025
Personalized Socially Assistive Robots With End-to-End Speech-Language Models For Well-Being Support
Mengxue Fu
Zhonghao Shi
Minyu Huang
Siqi Liu
Mina Kian
Yirui Song
Maja J. Matarić
62
0
0
18 Jul 2025
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Chen Wang
Tianyu Peng
Wen Yang
Yinan Bai
Guangfu Wang
...
Lanpeng Jia
Lingxiang Wu
Jinqiao Wang
Chengqing Zong
Jiajun Zhang
AuLLM
VLM
124
3
0
07 Jul 2025
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
306
2
0
01 Jul 2025
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
Hang Shao
Heting Gao
Yunhang Shen
Jiawei Chen
Zuwei Long
Dong Yang
Ke Li
Xing Sun
AuLLM
MoE
163
0
0
27 Jun 2025
WildSpeech-Bench: Benchmarking End-to-End SpeechLLMs in the Wild
Jian Zhang
Linhao Zhang
Bokai Lei
Chuhan Wu
Aiwei Liu
Wei Jia
Xiao-bin Zhou
AuLLM
LM&MA
185
2
0
27 Jun 2025
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li
Aditya Grover
222
2
0
18 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLM
AuLLM
VLM
234
8
0
16 Jun 2025
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
Fengjin Li
Jie Wang
Yadong Niu
Yongqing Wang
Meng Meng
Jian Luan
Zhiyong Wu
164
0
0
03 Jun 2025
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
Yixuan Hou
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
208
4
0
03 Jun 2025
Towards a Japanese Full-duplex Spoken Dialogue System
Atsumoto Ohashi
Shinya Iizuka
Jingjing Jiang
Ryuichiro Higashinaka
AuLLM
145
2
0
03 Jun 2025
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
Taesoo Kim
Jong Hwan Ko
AuLLM
121
0
0
01 Jun 2025
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
Siddhant Arora
Jinchuan Tian
Hayato Futami
Jee-weon Jung
Jiatong Shi
Yosuke Kashiwagi
E. Tsunoo
Shinji Watanabe
LRM
91
5
0
31 May 2025
StressTest: Can YOUR Speech LM Handle the Stress?
Iddo Yosha
Gallil Maimon
Yossi Adi
173
3
0
28 May 2025
1
2
Next