ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07919
  4. Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown
Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
Qiongqiong Wang
Hardik B. Sailor
Tianchi Liu
Wenyu Zhang
Muhammad Huzaifah
Nattadaporn Lertcheva
Shuo Sun
Nancy F. Chen
Jinyang Wu
AiTi Aw
172
1
0
20 Sep 2025
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
Dimitrios Damianos
Leon Voukoutis
Georgios Paraskevopoulos
Vassilis Katsouros
100
0
0
19 Sep 2025
Direct Simultaneous Translation Activation for Large Audio-Language Models
Direct Simultaneous Translation Activation for Large Audio-Language Models
Pei Zhang
Yiming Wang
Jialong Tang
Baosong Yang
Rui Wang
Yang Li
Fei Huang
102
0
0
19 Sep 2025
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
Linyang He
Qiaolin Wang
Xilin Jiang
Nima Mesgarani
160
1
0
19 Sep 2025
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
Youngwon Choi
Jaeyoon Jung
Hyeonyu Kim
Huu-Kim Nguyen
Hwayeon Kim
119
0
0
18 Sep 2025
Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
Zhu Li
Xiyuan Gao
Yuqing Zhang
Shekhar Nayak
Matt Coler
90
1
0
18 Sep 2025
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2
Savini Kashmira
Jayanaka L. Dantanarayana
Thamirawaran Sathiyalogeswaran
Yichao Yuan
Nishil Talati
Krisztian Flautner
Lingjia Tang
Jason Mars
125
1
0
17 Sep 2025
Preservation of Language Understanding Capabilities in Speech-aware Large Language Models
Preservation of Language Understanding Capabilities in Speech-aware Large Language Models
Marek Kubis
Paweł Skórzewski
Iwona Christop
Mateusz Czyżnikiewicz
Jakub Kubiak
Łukasz Bondaruk
Marcin Lewandowski
AuLLMELM
190
0
0
15 Sep 2025
WeaveMuse: An Open Agentic System for Multimodal Music Understanding and Generation
WeaveMuse: An Open Agentic System for Multimodal Music Understanding and Generation
Emmanouil Karystinaios
126
0
0
14 Sep 2025
ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs
ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs
Yibo Zhang
Guanbin Li
AAML
93
3
0
14 Sep 2025
Improving Audio Event Recognition with Consistency Regularization
Improving Audio Event Recognition with Consistency Regularization
Shanmuka Sadhu
Weiran Wang
115
0
0
12 Sep 2025
Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
Gokul Karthik Kumar
Rishabh Saraf
Ludovick Lepauloux
Abdul Muneer
Billel Mokeddem
Hakim Hacid
AuLLM
149
1
0
09 Sep 2025
FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations
FireRedChat: A Pluggable, Full-Duplex Voice Interaction System with Cascaded and Semi-Cascaded Implementations
Junjie Chen
Yao Hu
Junjie Li
K. Li
Kun Liu
...
Manzhen Wei
Yichen Wu
Fenglong Xie
K. Xu
Kun Xie
203
4
0
08 Sep 2025
GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
GRAM-R2^22: Self-Training Generative Foundation Reward Models for Reward Reasoning
Chenglong Wang
Yongyu Mu
Hang Zhou
Yifu Huo
Ziming Zhu
...
Tong Xiao
Xiaoyang Hao
Chunliang Zhang
Fandong Meng
Jingbo Zhu
OffRLLRMVLM
329
1
0
02 Sep 2025
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
Yiqun Yao
Xiang Li
Xin Jiang
Xuezhi Fang
N. Yu
Wenjia Ma
Aixin Sun
Yequan Wang
AuLLM
194
1
0
02 Sep 2025
The AudioMOS Challenge 2025
The AudioMOS Challenge 2025
Wen-Chin Huang
Hui Wang
Cheng Liu
Yi-Chiao Wu
Andros Tjandra
Wei-Ning Hsu
Erica Cooper
Yong Qin
Tomoki Toda
100
1
0
01 Sep 2025
Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
Shreyas Tirumala
Nishant Jain
Danny D. Leybzon
Trent D. Buskirk
103
1
0
01 Sep 2025
SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings
SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings
Jaekwon Yoo
Kunal Chandiramani
Divya Tadimeti
Abenezer Girma
C. Dhir
AuLLM
150
0
0
29 Aug 2025
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
J. Kim
Heeseung Yun
Sang Hoon Woo
Chao-Han Huck Yang
Gunhee Kim
AuLLM
120
1
0
28 Aug 2025
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Jeong Hun Yeo
Hyeongseop Rha
Sungjune Park
Junil Won
Y. Ro
187
0
0
28 Aug 2025
MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models
MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Zhihao Ouyang
Ju-Chiang Wang
Daiyu Zhang
Bin Chen
Shangjie Li
Quan Lin
AuLLM
167
0
0
27 Aug 2025
DESAMO: A Device for Elder-Friendly Smart Homes Powered by Embedded LLM with Audio Modality
DESAMO: A Device for Elder-Friendly Smart Homes Powered by Embedded LLM with Audio Modality
Youngwon Choi
Donghyuk Jung
Hwayeon Kim
AuLLM
60
0
0
26 Aug 2025
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
Haoyu Wang
Guangyan Zhang
Jiale Chen
Jingyu Li
Yuehai Wang
Yiwen Guo
AuLLM
203
0
0
26 Aug 2025
Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
Qing Xiao
Yingshan Peng
PeiPei Zhang
84
0
0
26 Aug 2025
Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System
Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System
Yanfan Du
Jun Zhang
Bin Wang
Jin Qiu
Daigang Xu
Yuan Ge
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
94
0
0
26 Aug 2025
Enhancing Speech Large Language Models through Reinforced Behavior Alignment
Enhancing Speech Large Language Models through Reinforced Behavior Alignment
Yansong Liu
Jiateng Li
Yuan Liu
159
0
0
25 Aug 2025
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
Dingdong Wang
Junan Li
Mingyu Cui
Dongchao Yang
Xueyuan Chen
Chao Yang
AuLLM
136
4
0
25 Aug 2025
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
Cheng Wang
Gelei Deng
Xianglin Yang
Han Qiu
Tianwei Zhang
AuLLM
152
4
0
21 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR
Beyond Transcription: Mechanistic Interpretability in ASR
Neta Glazer
Yael Segal-Feldman
Hilit Segev
Aviv Shamsian
Asaf Buchnick
Gill Hetz
Ethan Fetaya
Joseph Keshet
Aviv Navon
99
1
0
21 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
145
2
0
19 Aug 2025
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
161
0
0
18 Aug 2025
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
Zhifeng Kong
Arushi Goel
J. F. Santos
Sreyan Ghosh
Rafael Valle
Wei Ping
Bryan Catanzaro
ReLMAuLLMLRM
178
3
0
15 Aug 2025
Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge
Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge
Xiaoxiao Li
An Zhu
Youhai Jiang
Fengjie Zhu
115
1
0
15 Aug 2025
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMsEuropean Workshop on Visual Information Processing (EUVIP), 2025
Zheng Qin
Ruobing Zheng
Yabing Wang
Tianqi Li
Yi Yuan
Jingdong Chen
Le Wang
LRM
219
2
0
14 Aug 2025
$\text{M}^3\text{PDB}$: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
M3PDB\text{M}^3\text{PDB}M3PDB: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
B. Zhu
Cheng Gong
Muyang Wu
Ruihao Jing
Fan Liu
Xiaolei Zhang
Chi Zhang
Xuelong Li
118
0
0
13 Aug 2025
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Describe What You See with Multimodal Large Language Models to Enhance Video RecommendationsACM Conference on Recommender Systems (RecSys), 2025
Marco De Nadai
Andreas Damianou
M. Lalmas
VLM
108
0
0
13 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
235
4
0
12 Aug 2025
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
Yuanyuan Wang
Dongchao Yang
Yiwen Shao
Hangting Chen
Jiankun Zhao
Zhiyong Wu
Chao Yang
Xixin Wu
169
1
0
12 Aug 2025
Dual Information Speech Language Models for Emotional Conversations
Dual Information Speech Language Models for Emotional Conversations
Chun Wang
Chenyang Liu
Wenze Xu
Weihong Deng
AuLLM
99
0
0
11 Aug 2025
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
Shu Wu
Chenxing Li
Wenfu Wang
Hao Zhang
H. Wang
Meng Yu
Dong Yu
AuLLMKELMLRM
250
8
0
11 Aug 2025
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Qiongqiong Wang
Hardik B. Sailor
Jeremy H.M Wong
Tianchi Liu
Shuo Sun
Wenyu Zhang
Muhammad Huzaifah
Nancy F. Chen
Ai Ti Aw
AuLLM
102
2
0
10 Aug 2025
Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
Seraphina Fong
M. Matassoni
Alessio Brutti
178
1
0
07 Aug 2025
A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding
A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding
Runchuan Ye
Yixuan Zhou
Renjie Yu
Zijian Lin
Kehan Li
Xiang Li
Xin Liu
Guoyang Zeng
Z. Wu
SLR
243
4
0
07 Aug 2025
Unveiling the Landscape of Clinical Depression Assessment: From Behavioral Signatures to Psychiatric Reasoning
Unveiling the Landscape of Clinical Depression Assessment: From Behavioral Signatures to Psychiatric Reasoning
Z. Chen
Guanqun Bi
W. Zhang
J. Hu
Aoyun Wang
Xiyao Xiao
Kun Feng
Shiyu Huang
101
1
0
06 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
425
15
0
06 Aug 2025
Efficient Scaling for LLM-based ASR
Efficient Scaling for LLM-based ASR
Bingshen Mu
Yiwen Shao
Kun Wei
Dong Yu
Lei Xie
AuLLM
194
5
0
06 Aug 2025
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
Huan Liao
Qinke Ni
Yuancheng Wang
Yiheng Lu
Haoyue Zhan
Pengyuan Xie
Qiang Zhang
Zhizheng Wu
122
9
0
06 Aug 2025
Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
Thomas Thebaud
Yen-Ju Lu
Matthew Wiesner
Peter Viechnicki
Najim Dehak
114
0
0
06 Aug 2025
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Dmitrii Korzh
Dmitrii Tarasov
Artyom Iudin
Elvir Karimov
Matvey Skripkin
Nikita Kuzmin
Andrey Kuznetsov
Oleg Y. Rogov
Ivan Oseledets
172
0
0
05 Aug 2025
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Miaosen Luo
Jiesen Long
Zequn Li
Yunying Yang
Yuncheng Jiang
Sijie Mai
206
2
0
04 Aug 2025
Previous
123456
Next