ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07919
  4. Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
    AuLLM
ArXivPDFHTML

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 211 papers shown
Title
How to Connect Speech Foundation Models and Large Language Models? What
  Matters and What Does Not
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Francesco Verdini
Pierfrancesco Melucci
Stefano Perna
Francesco Cariaggi
Marco Gaido
...
Marek Kasztelnik
L. Bentivogli
Sébastien Bratières
P. Merialdo
Simone Scardapane
AuLLM
20
0
0
25 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
65
5
0
25 Sep 2024
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character
  Pre-training in LLMs
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
Yang Yuhang
Peng Yizhou
Eng Siong Chng
Xionghu Zhong
AuLLM
AI4CE
19
0
0
24 Sep 2024
Boosting Code-Switching ASR with Mixture of Experts Enhanced
  Speech-Conditioned LLM
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
Fengrun Zhang
Wang Geng
Hukai Huang
Cheng Yi
He Qu
He Qu
AuLLM
MoE
28
1
0
24 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
35
11
0
23 Sep 2024
SongTrans: An unified song transcription and alignment method for lyrics
  and notes
SongTrans: An unified song transcription and alignment method for lyrics and notes
Siwei Wu
Jinzheng He
Ruibin Yuan
Haojie Wei
Xipin Wei
Chenghua Lin
Jin Xu
Junyang Lin
27
1
0
22 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
44
0
0
22 Sep 2024
Large Language Model Should Understand Pinyin for Chinese ASR Error
  Correction
Large Language Model Should Understand Pinyin for Chinese ASR Error Correction
Yuang Li
Xiaosong Qiao
Xiaofeng Zhao
Huan Zhao
Wei Tang
Min Zhang
Hao Yang
23
1
0
20 Sep 2024
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
Bashar Alhafni
Sowmya Vajjala
Stefano Banno
Kaushal Kumar Maurya
Ekaterina Kochmar
AI4Ed
35
1
0
18 Sep 2024
The Art of Storytelling: Multi-Agent Generative AI for Dynamic
  Multimodal Narratives
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Samee Arif
Taimoor Arif
Muhammad Saad Haroon
Aamina Jamal Khan
Agha Ali Raza
Awais Athar
24
0
0
17 Sep 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for
  Multilingual Speech-to-Text
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue
Wei Ren
Xuelong Geng
Kun Wei
Longhao Li
Qijie Shao
Linju Yang
Kai Diao
Lei Xie
AuLLM
18
0
0
17 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities
  of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
36
4
0
17 Sep 2024
Enhancing Multilingual Speech Generation and Recognition Abilities in
  LLMs with Constructed Code-switched Data
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Jing Xu
Daxin Tan
Jiaqi Wang
Xiao Chen
19
0
0
17 Sep 2024
A Survey of Foundation Models for Music Understanding
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
48
2
0
15 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
38
1
0
14 Sep 2024
Affective Computing Has Changed: The Foundation Model Disruption
Affective Computing Has Changed: The Foundation Model Disruption
Björn Schuller
Adria Mallol-Ragolta
Alejandro Pena Almansa
Iosif Tsangko
Mostafa M. Amin
A. Semertzidou
Lukas Christ
Shahin Amiriparian
28
0
0
13 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
35
1
0
13 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
64
1
0
13 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
32
1
0
13 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
49
5
0
11 Sep 2024
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning
  of Pre-Trained Audio Models
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Xinhu Zheng
Anbai Jiang
Bing Han
Yanmin Qian
Pingyi Fan
Jia Liu
Wei-Qiang Zhang
18
0
0
11 Sep 2024
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Hongzhi Shu
Xinglin Li
Hongyu Jiang
Minghao Fu
Xinyu Li
22
0
0
10 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
W. Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
A. Aw
AuLLM
65
1
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
25
29
0
10 Sep 2024
MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on
  Heterogeneous and Long-tailed Data
MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data
Jianyi Zhang
H. Yang
Ang Li
Xin Guo
Pu Wang
Haiming Wang
Yiran Chen
Hai Li
20
2
0
09 Sep 2024
Just ASR + LLM? A Study on Speech Large Language Models' Ability to
  Identify and Understand Speaker in Spoken Dialogue
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Junkai Wu
Xulin Fan
Bo-Ru Lu
Xilin Jiang
N. Mesgarani
M. Hasegawa-Johnson
Mari Ostendorf
AuLLM
ELM
56
0
0
07 Sep 2024
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Yaoxun Xu
Shi-Xiong Zhang
Jianwei Yu
Zhiyong Wu
Dong Yu
AuLLM
14
3
0
01 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
23
0
0
30 Aug 2024
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding
Mohan Li
Cong-Thanh Do
Simon Keizer
Youmna Farag
Svetlana Stoyanchev
R. Doddipatla
25
2
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
47
32
0
29 Aug 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
36
6
0
26 Aug 2024
Video Emotion Open-vocabulary Recognition Based on Multimodal Large
  Language Model
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model
Mengying Ge
Dongkai Tang
Mingyang Li
VLM
17
1
0
21 Aug 2024
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on
  Whisper
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
Tianyi Xu
Kaixun Huang
Pengcheng Guo
Yu Zhou
Longtao Huang
Hui Xue
Lei Xie
CLL
27
0
0
20 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for
  Multimodal Emotion Recognition
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
43
2
0
20 Aug 2024
Grammatical Error Feedback: An Implicit Evaluation Approach
Grammatical Error Feedback: An Implicit Evaluation Approach
Stefano Bannò
Kate Knill
Mark J. F. Gales
13
0
0
18 Aug 2024
A Transcription Prompt-based Efficient Audio Large Language Model for
  Robust Speech Recognition
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
Yangze Li
Xiong Wang
Songjun Cao
Yike Zhang
Long Ma
Lei Xie
AuLLM
51
0
0
18 Aug 2024
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech
  Large Language Models
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
Yi-Cheng Lin
Wei-Chih Chen
Hung-yi Lee
31
1
0
14 Aug 2024
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot
  Soundscape Mapping
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
Subash Khanal
Eric Xing
S. Sastry
A. Dhakal
Zhexiao Xiong
Adeel Ahmad
Nathan Jacobs
34
2
0
13 Aug 2024
Style-Talker: Finetuning Audio Language Model and Style-Based
  Text-to-Speech Model for Fast Spoken Dialogue Generation
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
28
2
0
13 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
29
23
0
05 Aug 2024
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language
  Models
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Yunwen Xia
Hui Fang
Emmanouil Benetos
Jie Zhang
Chong Long
Dmitry Bogdanov
AuLLM
41
1
0
02 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
42
30
0
02 Aug 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech
  Translation via LLM Agent
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
46
3
0
31 Jul 2024
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language
  Models
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Junda Wu
Xintong Li
Tong Yu
Yu-Xiang Wang
Xiang Chen
Jiuxiang Gu
Lina Yao
Jingbo Shang
Julian McAuley
37
0
0
29 Jul 2024
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh
Shuo Han
Hazim T. Bukhari
Benjamin Elizalde
Hannes Gamper
Rita Singh
Bhiksha Raj
ReLM
LRM
AuLLM
16
7
0
25 Jul 2024
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with
  Micro-Expression Dynamics in Video Dialogues
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues
Liyun Zhang
25
1
0
23 Jul 2024
Computer Audition: From Task-Specific Machine Learning to Foundation
  Models
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Tuomas Virtanen
Björn Schuller
39
4
0
22 Jul 2024
LLaST: Improved End-to-end Speech Translation System Leveraged by Large
  Language Models
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
Xi Chen
Songyang Zhang
Qibing Bai
Kai-xiang Chen
Satoshi Nakamura
AuLLM
32
6
0
22 Jul 2024
Seal: Advancing Speech Language Models to be Few-Shot Learners
Seal: Advancing Speech Language Models to be Few-Shot Learners
Shuyu Lei
Lingen Liu
Jiaolong Yang
Yasen Jiao
Yuxiang Yang
Yushu Yang
Xiang Guo
VLM
17
0
0
20 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
32
111
0
16 Jul 2024
Previous
12345
Next