ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07919
  4. Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Miaosen Luo
Jiesen Long
Zequn Li
Yunying Yang
Yuncheng Jiang
Sijie Mai
206
2
0
04 Aug 2025
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
208
0
0
03 Aug 2025
EgoTrigger: Toward Audio-Driven Image Capture for Human Memory Enhancement in All-Day Energy-Efficient Smart Glasses
EgoTrigger: Toward Audio-Driven Image Capture for Human Memory Enhancement in All-Day Energy-Efficient Smart GlassesIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Akshay Paruchuri
Sinan Hersek
Lavisha Aggarwal
Qiao Yang
Xin Liu
Achin Kulshrestha
Andrea Colaco
Henry Fuchs
Ishan Chatterjee
EgoV
183
1
0
03 Aug 2025
Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR
Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR
Bingshen Mu
Hexin Liu
Hongfei Xue
Kun Wei
Lei Xie
212
1
0
02 Aug 2025
Multi-TW: Benchmarking Multimodal Models on Traditional Chinese Question Answering in Taiwan
Multi-TW: Benchmarking Multimodal Models on Traditional Chinese Question Answering in Taiwan
Jui-Ming Yao
Bing-Cheng Xie
Sheng-Wei Peng
Hao-Yuan Chen
He-Rong Zheng
Bing-Jia Tan
Peter Shaojui Wang
Shun-Feng Su
93
0
0
02 Aug 2025
Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
Zhiyuan Han
Beier Zhu
Yanlong Xu
Peipei Song
Xun Yang
195
3
0
02 Aug 2025
TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
Christian Simon
Masato Ishii
Akio Hayakawa
Zhi-Wei Zhong
Shusuke Takahashi
Takashi Shibuya
Yuki Mitsufuji
136
1
0
01 Aug 2025
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
Yadong Niu
Tianzi Wang
Heinrich Dinkel
Xingwei Sun
Jiahao Zhou
Gang Li
Jizhong Liu
Xunying Liu
Junbo Zhang
Jian Luan
AuLLM
230
3
0
31 Jul 2025
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
Zhepeng Wang
Yingjian Zhu
Guanghao Dong
Hongzhu Yi
F. Chen
Xinming Wang
Jun Xie
93
0
0
29 Jul 2025
Self-Improvement for Audio Large Language Model using Unlabeled Speech
Self-Improvement for Audio Large Language Model using Unlabeled Speech
S. Wang
Xinyuan Chen
Yao Xu
AuLLM
164
6
0
27 Jul 2025
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
219
1
0
25 Jul 2025
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi
Maike Züfle
Marco Gaido
Beatrice Savoldi
Danni Liu
Ioannis Douros
L. Bentivogli
Jan Niehues
289
4
0
25 Jul 2025
DIFFA: Large Language Diffusion Models Can Listen and Understand
DIFFA: Large Language Diffusion Models Can Listen and Understand
Jiaming Zhou
Hongjie Chen
Shiwan Zhao
Jian Kang
Jie Li
...
Haoqin Sun
Hui Wang
Aobo Kong
Yong Qin
X. Li
214
3
0
24 Jul 2025
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
Hongfei Xue
Kaixun Huang
Zhikai Zhou
Shen Huang
Shidong Shang
117
2
0
24 Jul 2025
TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios
TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios
Zehan Li
Hongjie Chen
Yuxin Zhang
Jing Zhou
Xuening Wang
...
Jie Lian
Jian Kang
Jie Li
Yongxiang Li
Zhongjiang He
205
2
0
24 Jul 2025
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
Shanbo Cheng
Yu Bao
Longxiang Zhang
Yu Lu
Ningxin Peng
...
Wenhao Zhu
Liehao Zou
Lu Lu
Yuping Wang
Yonghui Wu
VLM
441
1
0
23 Jul 2025
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Miaomiao Gao
Xiaoxiao Xiang
Yiwen Guo
AILaw
162
1
0
23 Jul 2025
Step-Audio 2 Technical Report
Step-Audio 2 Technical Report
Boyong Wu
Chao Yan
Chen Hu
Cheng Yi
Chengli Feng
...
Yuanwei Lu
Yuchu Luo
Yuhe Yin
Yumeng Zhan
Y. Zhang
AuLLM
298
0
0
22 Jul 2025
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
Pengfei Cai
Yan Song
Qing Gu
Nan Jiang
Haoyu Song
Ian Mcloughlin
VLM
243
1
0
22 Jul 2025
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
Jinbo Hu
Yin Cao
Ming Wu
Feiran Yang
J. Yang
VLM
170
3
0
22 Jul 2025
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
Shoutao Guo
Shaolei Zhang
Qingkai Fang
Zhengrui Ma
Min Zhang
Yang Feng
AuLLM
232
2
0
20 Jul 2025
The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents
The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents
Lixu Wang
Kaixiang Yao
Xinfeng Li
Dong Yang
Haoyang Li
Xiaofeng Wang
Wei Dong
AuLLM
263
5
0
14 Jul 2025
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Chen Wang
Tianyu Peng
Wen Yang
Yinan Bai
Guangfu Wang
...
Lanpeng Jia
Lingxiang Wu
Jinqiao Wang
Chengqing Zong
Jiajun Zhang
AuLLMVLM
177
3
0
07 Jul 2025
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
Hang Shao
Heting Gao
Yunhang Shen
Jiawei Chen
Zuwei Long
Dong Yang
Ke Li
Xing Sun
AuLLMMoE
226
2
0
27 Jun 2025
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
164
0
0
20 Jun 2025
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text LearningInternational Workshop on Spoken Language Translation (IWSLT), 2025
Giuseppe Attanasio
Sonal Sannigrahi
Ben Peters
Marcely Zanon Boito
AuLLM
190
0
0
20 Jun 2025
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
391
2
0
18 Jun 2025
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Jiamin Xie
Ju Lin
Yiteng Huang
Tyler Vuong
Zhaojiang Lin
...
Peng Su
Prashant Rawat
Sangeeta Srivastava
Ming Sun
Florian Metze
144
4
0
17 Jun 2025
GRAM: A Generative Foundation Reward Model for Reward Generalization
GRAM: A Generative Foundation Reward Model for Reward Generalization
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
...
Bei Li
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
ALMOffRLLRM
297
12
0
17 Jun 2025
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
Yizhou Peng
Hexin Liu
Eng Siong Chng
AuLLM
249
1
0
16 Jun 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
Yizhou Peng
Bin Wang
Yi-Wen Chao
Ziyang Ma
Haoyang Zhang
Hexin Liu
Xie Chen
Eng Siong Chng
ELM
238
1
0
16 Jun 2025
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
Yinghao Ma
Siyou Li
Juntao Yu
Emmanouil Benetos
Akira Maezawa
AuLLMVLM
253
4
0
14 Jun 2025
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
Xiaoran Fan
Zhichao Sun
Yangfan Gao
Jingfei Xiong
Hang Yan
...
Yunke Zhang
Demei Yan
Shaokang Dong
Changzhi Sun
Tao Gui
222
1
0
14 Jun 2025
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita
Tomoya Mizumoto
Atsushi Kojima
Lianbo Liu
Yui Sudo
AuLLM
292
0
0
12 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
257
1
0
12 Jun 2025
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yusuke Fujita
Atsushi Kojima
Tomoya Mizumoto
Lianbo Liu
182
0
0
11 Jun 2025
CoLMbo: Speaker Language Model for Descriptive Profiling
CoLMbo: Speaker Language Model for Descriptive Profiling
Massa Baali
Shuo Han
Syed Abdul Hannan
Purusottam Samal
Karanveer Singh
Soham Deshmukh
Rita Singh
Bhiksha Raj
AuLLM
295
1
0
11 Jun 2025
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
Luel Hagos Beyene
Vivek Verma
Min Ma
Jesujoba Oluwadara Alabi
Fabian David Schmidt
Joyce Nakatumba-Nabende
David Ifeoluwa Adelani
336
2
0
10 Jun 2025
Teaching Physical Awareness to LLMs through Sounds
Weiguo Wang
Andy Nie
Wenrui Zhou
Yi Kai
Chengchen Hu
250
2
0
10 Jun 2025
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAMLSILM
154
0
0
10 Jun 2025
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Bikash Dutta
Rishabh Ranjan
Shyam Sathvik
Mayank Vatsa
Richa Singh
116
1
0
07 Jun 2025
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Wenyu Zhang
Yingxu He
Geyu Lin
Zhuohan Liu
Shuo Sun
...
Jeremy H.M Wong
Qiongqiong Wang
Hardik B. Sailor
Nancy F. Chen
Ai Ti Aw
AuLLM
235
2
0
07 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
376
4
0
05 Jun 2025
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
Te Ma
Min Bi
Saierdaer Yusuyin
Hao Huang
Zhijian Ou
302
2
0
05 Jun 2025
GRAM: Spatial general-purpose audio representation models for real-world applications
GRAM: Spatial general-purpose audio representation models for real-world applications
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
300
1
0
01 Jun 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual ModalitiesVolume 1 (V1), 2025
Fauzan Farooqui
Thy Thy Tran
Preslav Nakov
Iryna Gurevych
MLLMAAML
141
0
0
31 May 2025
Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection
Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection
Shangkun Huang
Jing Deng
Jintao Kang
Rong Zheng
186
1
0
28 May 2025
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
Mingchen Shao
Xinfa Zhu
C. Wang
Bingshen Mu
Hai Li
Ying Yan
Junhui Liu
Danming Xie
Lei Xie
181
2
0
28 May 2025
Assessment of L2 Oral Proficiency using Speech Large Language Models
Assessment of L2 Oral Proficiency using Speech Large Language Models
Rao Ma
Mengjie Qian
Siyuan Tang
Stefano Bannò
Kate Knill
Mark Gales
AuLLM
247
4
0
27 May 2025
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
Zexu Pan
Shengkui Zhao
Tingting Wang
Kun Zhou
Yukun Ma
Chong Zhang
B. Ma
226
0
0
27 May 2025
Previous
123456
Next