Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.01591
Cited By
v1
v2
v3 (latest)
BAT: Learning to Reason about Spatial Sounds with Large Language Models
2 February 2024
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"BAT: Learning to Reason about Spatial Sounds with Large Language Models"
50 / 58 papers shown
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
Zhe Sun
Yujun Cai
Jiayu Yao
Yiwei Wang
AuLLM
LRM
467
1
0
17 Nov 2025
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
Xu Zheng
Zihao Dongfang
Lutao Jiang
Boyuan Zheng
Yulong Guo
...
L. Zhang
Danda Pani Paudel
Nicu Sebe
Luc Van Gool
Xuming Hu
LRM
VLM
855
13
0
29 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLM
LRM
563
5
0
28 Oct 2025
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Wenxiang Guo
Changhao Pan
Zhiyuan Zhu
Xintong Hu
Yu Zhang
...
Z. Chen
Yanhao Yu
Qiange Huang
Fei Wu
Zhou Zhao
303
1
0
12 Oct 2025
Revisiting Self-Play Preference Optimization: On the Role of Prompt Difficulty
Yao Xiao
Jung-jae Kim
Roy Ka-wei Lee
Lidong Bing
138
0
0
07 Oct 2025
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
VLM
LRM
166
3
0
30 Sep 2025
Spatial Audio Motion Understanding and Reasoning
A. Sridhar
Yinyi Guo
Erik M. Visser
103
0
0
18 Sep 2025
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
Kevin Wilkinghoff
Zheng-Hua Tan
194
0
0
17 Sep 2025
Deep Learning for Personalized Binaural Audio Reproduction
Xikun Lu
Yunda Chen
Zehua Chen
Jie Wang
Mingxing Liu
Hongmei Hu
C. Zheng
Stefan Bleeck
Jinqiu Sang
264
2
0
30 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
263
3
0
08 Aug 2025
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
Jinbo Hu
Yin Cao
Ming Wu
Feiran Yang
J. Yang
VLM
241
4
0
22 Jul 2025
MOSPA: Human Motion Generation Driven by Spatial Audio
Shuyang Xu
Zhiyang Dou
Mingyi Shi
Liang Pan
Leo Ho
...
Yuan Liu
Cheng Lin
Y. Ma
Wenping Wang
Taku Komura
286
11
0
16 Jul 2025
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
488
2
0
18 Jun 2025
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Jiamin Xie
Ju Lin
Yiteng Huang
Tyler Vuong
Zhaojiang Lin
...
Peng Su
Prashant Rawat
Sangeeta Srivastava
Ming Sun
Florian Metze
197
6
0
17 Jun 2025
GRAM: Spatial general-purpose audio representation models for real-world applications
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
423
1
0
01 Jun 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
755
9
0
29 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
395
4
0
11 Apr 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
630
10
0
10 Jan 2025
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Tianhao Shen
Chao Zhang
316
8
0
09 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
International Conference on Learning Representations (ICLR), 2024
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
609
17
0
03 Oct 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
440
25
0
25 Sep 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Maria Sandsten
B. Schuller
475
11
0
22 Jul 2024
Can Large Language Models Understand Spatial Audio?
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
...
Jun Zhang
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
431
20
0
12 Jun 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Computer Vision and Pattern Recognition (CVPR), 2024
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas Guibas
Fei Xia
LRM
ReLM
414
714
0
22 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
377
90
0
07 Jan 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
449
700
0
14 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
490
529
0
20 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
538
105
0
07 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
12.4K
16,448
0
18 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
International Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
585
1,151
0
26 Jun 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Neural Information Processing Systems (NeurIPS), 2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
D. Krause
Kengo Uchida
...
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Maria Sandsten
Yuki Mitsufuji
327
98
0
15 Jun 2023
Pengi: An Audio Language Model for Audio Tasks
Neural Information Processing Systems (NeurIPS), 2023
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
528
268
0
19 May 2023
Listen, Think, and Understand
International Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
834
241
0
18 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Shiyang Feng
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
372
734
0
28 Apr 2023
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI Conference on Artificial Intelligence (AAAI), 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
...
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
LM&MA
AuLLM
285
376
0
25 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
1.4K
8,828
0
17 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
20.2K
19,316
0
27 Feb 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
International Conference on Machine Learning (ICML), 2022
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
550
561
0
18 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.4K
6,745
0
06 Dec 2022
AudioLM: a Language Modeling Approach to Audio Generation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
559
892
0
07 Sep 2022
Masked Autoencoders that Listen
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
673
424
0
13 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Neural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
402
124
0
16 Jun 2022
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
E. Guizzo
Christian Marinoni
Marco Pennese
Xinlei Ren
Xiguang Zheng
Chen Zhang
Bruno Masiero
A. Uncini
Danilo Comminiello
324
59
0
21 Feb 2022
SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bing Yang
Hong Liu
Xiaofei Li
257
37
0
16 Feb 2022
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
IEEE International Conference on Computer Vision (ICCV), 2021
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
367
124
0
11 Oct 2021
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
269
143
0
03 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
385
127
0
29 Oct 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Computer Vision and Pattern Recognition (CVPR), 2020
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
256
88
0
11 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
959
3,981
0
16 May 2020
The LOCATA Challenge: Acoustic Source Localization and Tracking
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
C. Evers
Heinrich W. Löllmann
H. Mellmann
Alexander Schmidt
Hendrik Barfuss
Patrick A. Naylor
Walter Kellermann
297
162
0
03 Sep 2019
1
2
Next
Page 1 of 2