Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2311.07919
Cited By
v1
v2 (latest)
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"
50 / 276 papers shown
Title
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
195
0
0
26 May 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
290
0
0
26 May 2025
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Haiyang Sun
Shujie Hu
Shujie Liu
L. Meng
Hui Wang
...
Yifan Yang
Yanqing Liu
Sheng Zhao
Yan Lu
Y. Qian
276
4
0
26 May 2025
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
Guanyu Hou
Jiaming He
Yinhang Zhou
Ji Guo
Yitong Qiao
Rui Zhang
Wenbo Jiang
AAML
249
1
0
26 May 2025
Efficient Speech Translation through Model Compression and Knowledge Distillation
International Workshop on Spoken Language Translation (IWSLT), 2025
Yasmin Moslem
195
1
0
26 May 2025
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
Xun Gong
Anqi Lv
Zhiming Wang
Huijia Zhu
Y. Qian
164
5
0
25 May 2025
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs
Firoj Alam
Md. Arid Hasan
Shammur A. Chowdhury
222
3
0
25 May 2025
Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
Jongwoo Ko
S. Kim
Sungwoo Cho
Se-Young Yun
ELM
LRM
534
0
0
24 May 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
372
59
0
24 May 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
AuLLM
VGen
184
20
0
23 May 2025
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junlong Tong
Jinlan Fu
Zixuan Lin
Yingqi Fan
Anhao Zhao
Hui Su
Xiaoyu Shen
361
2
0
22 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
358
4
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLM
VLM
412
61
0
22 May 2025
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
583
0
0
21 May 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junlin Li
Guodong DU
Jing Li
Sim Kuan Goh
Wenya Wang
...
Fangming Liu
Jing Li
Saleh Alharbi
Daojing He
Min Zhang
MoMe
CLL
338
1
0
21 May 2025
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
366
2
0
21 May 2025
In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
Nathan Roll
C. Graham
Yuka Tatsumi
Kim Tien Nguyen
Meghan Sumner
Dan Jurafsky
331
5
0
20 May 2025
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
Haoyang Zhang
Hexin Liu
Xiangyu Zhang
Qiquan Zhang
Yuchen Hu
Junqi Zhao
Fei Tian
Xuerui Yang
Eng Siong Chng
Eng Siong Chng
414
0
0
20 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
354
1
0
20 May 2025
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Chun-Yi Kuan
Hung-yi Lee
266
3
0
20 May 2025
Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation
Qiongqiong Wang
Hardik B. Sailor
Tianchi Liu
Ai Ti Aw
276
4
0
19 May 2025
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
Chih-Kai Yang
Neo Ho
Yen-Ting Piao
Hung-yi Lee
AuLLM
LRM
498
18
0
19 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
357
3
0
14 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
307
23
0
14 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
248
2
0
05 May 2025
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
ACM Conference on Health, Inference, and Learning (CHIL), 2025
Tsai-Ning Wang
Lin-Lin Chen
Neil Zeghidour
Aaqib Saeed
AuLLM
LM&MA
780
1
0
02 May 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
942
2
0
15 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
920
8
0
12 Apr 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
432
12
0
09 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
440
4
0
03 Apr 2025
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Yongyi Zang
Sean O'Brien
Taylor Berg-Kirkpatrick
Julian McAuley
Cheng-i Wang
AuLLM
353
11
0
01 Apr 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Yanjie Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
306
0
0
31 Mar 2025
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack
Weichen Dai
Zijie Dai
Zhijie Huang
Yixuan Pan
Xinhe Li
Xi Li
Yi Zhou
Ji Qi
Wu Jiang
179
0
0
28 Mar 2025
OmniVox: Zero-Shot Emotion Recognition with Omni-LLMs
John Murzaku
Owen Rambow
AuLLM
215
2
0
27 Mar 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Longji Xu
Chao Zhang
AuLLM
461
11
0
26 Mar 2025
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications
Yun Feng
Haohang Li
Yangyang Yu
Shashidhar Reddy Javaji
Yueru He
...
Xiao-Yang Liu
K. P. Subbalakshmi
Yijia Zhao
Sophia Ananiadou
J. Nie
AuLLM
558
5
0
26 Mar 2025
Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition Across Languages
Heqing Zou
Fengmao Lv
Desheng Zheng
Eng Siong Chng
D. Rajan
302
2
0
25 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Longji Xu
Shengqiong Wu
Yujiao Shi
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
589
108
0
16 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
374
11
0
14 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
263
16
0
11 Mar 2025
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
Xiang Lan
Feng Wu
Kai He
Qinghao Zhao
Zhiqin Jiang
Mengling Feng
AI4TS
292
16
0
08 Mar 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information
Feng Jiang
Zhiyu Lin
Fan Bu
Yuhao Du
Benyou Wang
Haoyang Li
AuLLM
ELM
273
9
0
07 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
304
73
0
06 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
389
65
0
04 Mar 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Dingdong Wang
Jin Xu
Ruihang Chu
Zhifang Guo
Xinyu Wang
Jincenzi Wu
Dongchao Yang
Shengpeng Ji
Junyang Lin
AuLLM
277
8
0
04 Mar 2025
As Good as It KAN Get: High-Fidelity Audio Representation
Patryk Marszałek
Maciej Rut
Piotr Kawa
Przemysław Spurek
P. Syga
497
2
0
04 Mar 2025
MindBridge: Scalable and Cross-Model Knowledge Editing via Memory-Augmented Modality
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Shuaike Li
Kai Zhang
Qiang Liu
Tong Xu
KELM
265
4
0
04 Mar 2025
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
International Conference on Learning Representations (ICLR), 2025
Siddhant Arora
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
Shinji Watanabe
291
19
0
03 Mar 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
259
2
0
24 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
304
2
0
24 Feb 2025
Previous
1
2
3
4
5
6
Next