ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Communities
  3. ...

Neighbor communities

0 / 0 papers shown
Title
Top Contributors
Name# Papers# Citations
Social Events
DateLocationEvent
  1. Home
  2. Communities
  3. AuLLM

Audio Large Language Models

AuLLM
More data

Exploring the development and application of large language models specifically tailored for audio data processing and understanding.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown
Title

All papers

50 / 561 papers shown
Title
Spoken Conversational Agents with Large Language Models
Spoken Conversational Agents with Large Language Models
Chao-Han Huck Yang
Andreas Stolcke
Larry Heck
AuLLM
220
0
0
02 Dec 2025
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
Yuezhang Peng
Chonghao Cai
Ziang Liu
Shuai Fan
Sheng Jiang
...
Kele Xu
Yao Li
Sheng Wang
Libo Qin
Xie Chen
AuLLM
40
0
0
01 Dec 2025
Cross-Lingual Interleaving for Speech Language Models
Adel Moumen
Guangzhi Sun
Philip C. Woodland
AuLLM
0
0
0
01 Dec 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
Jeongik Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLMVLM
16
0
0
01 Dec 2025
Developing an Open Conversational Speech Corpus for the Isan Language
Developing an Open Conversational Speech Corpus for the Isan Language
Adisai Na-Thalang
Chanakan Wittayasakpan
Kritsadha Phatcharoen
Supakit Buakaw
AuLLM
144
0
0
26 Nov 2025
Towards Audio Token Compression in Large Audio Language Models
Towards Audio Token Compression in Large Audio Language Models
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
124
0
0
26 Nov 2025
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
Mingyue Huo
Wei-Cheng Tseng
Yiwen Shao
Hao Zhang
Dong Yu
AuLLM
234
0
0
19 Nov 2025
Step-Audio-R1 Technical Report
Step-Audio-R1 Technical Report
Fei Tian
Xiangyu Zhang
Y. Zhang
Haoyang Zhang
Yuxin Li
...
Eng Siong Chng
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
AuLLMLRM
179
0
0
19 Nov 2025
FoleyBench: A Benchmark For Video-to-Audio Models
FoleyBench: A Benchmark For Video-to-Audio Models
Satvik Dixit
Koichi Saito
Zhi-Wei Zhong
Yuki Mitsufuji
Chris Donahue
VGenAuLLM
271
0
0
17 Nov 2025
Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs
Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs
Zhe Sun
Yujun Cai
Jiayu Yao
Yiwei Wang
AuLLMLRM
232
0
0
17 Nov 2025
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
HongYu Liu
J. Li
Changxi Guo
Hao Chen
Yaqian Huang
Yifu Guo
Huan Yang
Lihua Cai
AuLLM
154
0
0
14 Nov 2025
TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models
TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models
Hualei Wang
Yiming Li
Shuo Ma
Hong Liu
Xiangdong Wang
AuLLM
62
0
0
14 Nov 2025
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
He Zhang
Wenqian Cui
Haoning Xu
Xiaohui Li
Lei Zhu
Shaohua Ma
Irwin King
AuLLM
75
0
0
13 Nov 2025
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Yudong Yang
Xuezhen Zhang
Zhifeng Han
S. Wang
Jimin Zhuang
Zengrui Jin
Jing Shao
Guangzhi Sun
C. Zhang
AAMLAuLLM
205
0
0
13 Nov 2025
Music Flamingo: Scaling Music Understanding in Audio Language Models
Music Flamingo: Scaling Music Understanding in Audio Language Models
Sreyan Ghosh
Arushi Goel
Lasha Koroshinadze
Sang-gil Lee
Zhifeng Kong
...
R. Duraiswami
Dinesh Manocha
Wei Ping
Mohammad Shoeybi
Bryan Catanzaro
MLLMAuLLMVLMLRM
186
0
0
13 Nov 2025
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
Jiliang Hu
Zuchao Li
Baoyuan Qi
Liu Guoming
Ping Wang
RALMAuLLM
188
0
0
12 Nov 2025
StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak
StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak
Hongyi Li
Chengxuan Zhou
Chu Wang
Sicheng Liang
Yanting Chen
Qinlin Xie
Jiawei Ye
Jie Wu
AuLLMAAML
233
0
0
12 Nov 2025
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Xueyao Zhang
C. Wang
Huan Liao
Z. Li
Yuancheng Wang
...
Dongya Jia
Yuanzhe Chen
X. Li
Z. Chen
Z. Wu
EGVMAuLLM
260
0
0
11 Nov 2025
VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context
VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context
Heyang Liu
Ziyang Cheng
Yuhao Wang
Hongcheng Liu
Y. Li
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
74
0
0
11 Nov 2025
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Umberto Cappellazzo
Xubo Liu
Pingchuan Ma
Stavros Petridis
Maja Pantic
AuLLM
191
0
0
10 Nov 2025
SAR-LM: Symbolic Audio Reasoning with Large Language Models
SAR-LM: Symbolic Audio Reasoning with Large Language Models
Termeh Taheri
Yinghao Ma
Emmanouil Benetos
AuLLMLRM
98
0
0
09 Nov 2025
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Hardik B. Sailor
Aw Ai Ti
Chen Fang Yih Nancy
Chiu Ying Lay
Ding Yang
...
Wong Heng Meng Jeremy
Wu Jinyang
Zhang Huayun
Zhang Longyin
Zou Xunlong
AuLLM
252
0
0
07 Nov 2025
An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM
An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM
Jiawei Liu
Enis Berk Çoban
Zarina Schevchenko
Hao Tang
Zhigang Zhu
Michael I. Mandel
Johanna Devaney
AuLLMLRM
184
0
0
04 Nov 2025
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
Chaoqun Liu
Mahani Aljunied
Guizhen Chen
Hou Pong Chan
Weiwen Xu
Yu Rong
Wenxuan Zhang
AuLLM
214
1
0
03 Nov 2025
Expressive Range Characterization of Open Text-to-Audio Models
Expressive Range Characterization of Open Text-to-Audio Models
Jonathan Morse
Azadeh Naderi
Swen E. Gaudl
Mark Cartwright
Amy K. Hoover
M. Nelson
AuLLMVGen
321
0
0
31 Oct 2025
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
Hitomi Jin Ling Tee
Chaoren Wang
Zijie Zhang
Zhizheng Wu
AuLLMELM
296
0
0
30 Oct 2025
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Pedro Corrêa
João Lima
Victor Moreno
Lucas Ueda
Paula D. P. Costa
AuLLM
296
0
0
29 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLMLRM
295
2
0
28 Oct 2025
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Kang Zhang
T. Pham
Suyeon Lee
Axi Niu
Arda Senocak
Joon Son Chung
AuLLMVGen
155
0
0
28 Oct 2025
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
Bohan Li
Wenbin Huang
Yuhang Qiu
Yiwei Guo
Hankun Wang
Zhihan Li
Jing Peng
Ziyang Ma
Xie Chen
Kai Yu
AuLLM
129
0
0
27 Oct 2025
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
Michael Ungersböck
Florian Grötschla
Luca A. Lanzendörfer
June Young Yi
Changho Choi
Roger Wattenhofer
AuLLM
93
0
0
26 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
213
2
0
26 Oct 2025
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Li Zhou
Lutong Yu
You Lyu
Yihang Lin
Zefeng Zhao
Junyi Ao
Yuhao Zhang
Benyou Wang
Haizhou Li
AuLLM
109
0
0
26 Oct 2025
Are These Even Words? Quantifying the Gibberishness of Generative Speech Models
Are These Even Words? Quantifying the Gibberishness of Generative Speech Models
Danilo de Oliveira
Tal Peer
Jonas Rochdi
Timo Gerkmann
AuLLM
101
0
0
24 Oct 2025
Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment
Which Evaluation for Which Model? A Taxonomy for Speech Model Assessment
Maureen de Seyssel
Eeshan Gunesh Dhekane
AuLLMELM
80
0
0
22 Oct 2025
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Yejin Kwon
Taewoo Kang
Hyunsoo Yoon
Changouk Kim
AuLLMELMLRM
141
0
0
22 Oct 2025
Can large audio language models understand child stuttering speech? speech summarization, and source separation
Can large audio language models understand child stuttering speech? speech summarization, and source separation
Chibuzor Okocha
Maya Bakri
Christan Grant
AuLLM
120
0
0
21 Oct 2025
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS
Brandon James Carone
Iran R. Roman
Pablo Ripollés
AuLLMLRM
110
1
0
21 Oct 2025
End-to-end Listen, Look, Speak and Act
End-to-end Listen, Look, Speak and Act
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Lu Lu
C. Zhang
AuLLM
136
0
0
19 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELMAuLLM
125
0
0
19 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
96
0
0
17 Oct 2025
VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency
VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency
Hongcheng Liu
Yixuan Hou
Heyang Liu
Yuhao Wang
Yanfeng Wang
Y Samuel Wang
AuLLM
95
0
0
17 Oct 2025
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
Xiaohan Zhao
Hongyu Xiang
Shengze Ye
Song Li
Zhengkun Tian
Guanyu Chen
Ke Ding
Guanglu Wan
AuLLM
124
1
0
17 Oct 2025
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
Yueqian Lin
Zhengmian Hu
Jayakumar Subramanian
Qinsi Wang
N. Vlassis
Hai Helen Li
Yiran Chen
LLMAGAuLLMLRM
141
1
0
17 Oct 2025
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
Hui Wang
J. Zhao
Yifan Yang
Shujie Liu
Junyang Chen
...
Jinyu Li
Jiaming Zhou
Haoqin Sun
Yan Lu
Yong Qin
AuLLMELM
150
1
0
16 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
340
3
0
15 Oct 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Zhenyu Liu
Yunxin Li
Xuanyu Zhang
Qixun Teng
Shenyuan Jiang
...
Mingjun Zhao
Yu-Syuan Xu
Yancheng He
Baotian Hu
Min Zhang
AuLLMMoE
174
0
0
15 Oct 2025
Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module
Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module
Ruitao Feng
Bixi Zhang
Sheng Liang
Zheng Yuan
AuLLMMoELLMSV
155
0
0
15 Oct 2025
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
Kuan-Yi Lee
Tsung-En Lin
Hung-yi Lee
AuLLMLRM
103
0
0
13 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLMLRM
193
1
0
13 Oct 2025
Loading #Papers per Month with "AuLLM"
Past speakers
Name (-)
Top Contributors
Name (-)
Top Organizations at ResearchTrend.AI
Name (-)
Social Events
DateLocationEvent
No social events available