Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.07919
Cited By
v1
v2 (latest)
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"
29 / 279 papers shown
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Spoken Language Technology Workshop (SLT), 2024
Junkai Wu
Xulin Fan
Bo-Ru Lu
Xilin Jiang
N. Mesgarani
M. Hasegawa-Johnson
Mari Ostendorf
AuLLM
ELM
391
10
0
07 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Spoken Language Technology Workshop (SLT), 2024
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
226
9
0
30 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
International Conference on Learning Representations (ICLR), 2024
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
411
125
0
29 Aug 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
International Conference on Learning Representations (ICLR), 2024
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
593
22
0
26 Aug 2024
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model
Mengying Ge
Dongkai Tang
Mingyang Li
VLM
188
1
0
21 Aug 2024
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
Interspeech (Interspeech), 2024
Yangze Li
Xiong Wang
Songjun Cao
Yike Zhang
Long Ma
Lei Xie
AuLLM
233
8
0
18 Aug 2024
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
Spoken Language Technology Workshop (SLT), 2024
Yi-Cheng Lin
Wei-Chih Chen
Hung-yi Lee
224
11
0
14 Aug 2024
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
ACM Multimedia (MM), 2024
Subash Khanal
Eric Xing
Srikumar Sastry
Aayush Dhakal
Zhexiao Xiong
Adeel Ahmad
Nathan Jacobs
247
4
0
13 Aug 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Maria Sandsten
B. Schuller
417
9
0
22 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Xinyu Fang
Junming Yang
Xiangyu Zhao
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
734
363
0
16 Jul 2024
Pronunciation Assessment with Multi-modal Large Language Models
Kaiqi Fu
Linkai Peng
Nan Yang
Shuran Zhou
270
8
0
12 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
385
73
0
05 Jul 2024
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Minghui Fang
Shengpeng Ji
Jialong Zuo
Hai Huang
Yan Xia
...
Xiaoda Yang
Wenrui Liu
Gang Wang
Zhenhua Dong
Zhou Zhao
191
9
0
25 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
596
82
0
23 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Yuxuan Zhou
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
261
128
0
17 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
304
9
0
13 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
191
19
0
12 Jun 2024
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Xin Jing
Andreas Triantafyllopoulos
Björn Schuller
153
11
0
11 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
344
5
0
09 Jun 2024
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
Yuanbo Hou
Qiaoqiao Ren
A. Mitchell
Wenwu Wang
Jian Kang
Tony Belpaeme
Dick Botteldooren
470
4
0
09 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
493
69
0
14 May 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
230
108
0
31 Mar 2024
Domain Adaptation for Contrastive Audio-Language Models
Soham Deshmukh
Rita Singh
Bhiksha Raj
VLM
233
10
0
14 Feb 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
262
177
0
12 Feb 2024
Cacophony: An Improved Contrastive Audio-Text Model
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Ge Zhu
Jordan Darefsky
Zhiyao Duan
AuLLM
335
22
0
10 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Ming-Yu Liu
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
523
165
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
433
38
0
02 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
517
340
0
24 Jan 2024
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
456
103
0
07 Oct 2023
Previous
1
2
3
4
5
6
Page 6 of 6