ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01591
  4. Cited By
BAT: Learning to Reason about Spatial Sounds with Large Language Models
v1v2v3 (latest)

BAT: Learning to Reason about Spatial Sounds with Large Language Models

2 February 2024
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
    LRM
ArXiv (abs)PDFHTMLGithub

Papers citing "BAT: Learning to Reason about Spatial Sounds with Large Language Models"

50 / 58 papers shown
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
Zhe Sun
Yujun Cai
Jiayu Yao
Yiwei Wang
AuLLMLRM
467
1
0
17 Nov 2025
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
Xu Zheng
Zihao Dongfang
Lutao Jiang
Boyuan Zheng
Yulong Guo
...
L. Zhang
Danda Pani Paudel
Nicu Sebe
Luc Van Gool
Xuming Hu
LRMVLM
855
13
0
29 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLMLRM
563
5
0
28 Oct 2025
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Wenxiang Guo
Changhao Pan
Zhiyuan Zhu
Xintong Hu
Yu Zhang
...
Z. Chen
Yanhao Yu
Qiange Huang
Fei Wu
Zhou Zhao
303
1
0
12 Oct 2025
Revisiting Self-Play Preference Optimization: On the Role of Prompt Difficulty
Revisiting Self-Play Preference Optimization: On the Role of Prompt Difficulty
Yao Xiao
Jung-jae Kim
Roy Ka-wei Lee
Lidong Bing
138
0
0
07 Oct 2025
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
VLMLRM
166
3
0
30 Sep 2025
Spatial Audio Motion Understanding and Reasoning
Spatial Audio Motion Understanding and Reasoning
A. Sridhar
Yinyi Guo
Erik M. Visser
103
0
0
18 Sep 2025
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
Kevin Wilkinghoff
Zheng-Hua Tan
194
0
0
17 Sep 2025
Deep Learning for Personalized Binaural Audio Reproduction
Deep Learning for Personalized Binaural Audio Reproduction
Xikun Lu
Yunda Chen
Zehua Chen
Jie Wang
Mingxing Liu
Hongmei Hu
C. Zheng
Stefan Bleeck
Jinqiu Sang
264
2
0
30 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
263
3
0
08 Aug 2025
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
Jinbo Hu
Yin Cao
Ming Wu
Feiran Yang
J. Yang
VLM
241
4
0
22 Jul 2025
MOSPA: Human Motion Generation Driven by Spatial Audio
MOSPA: Human Motion Generation Driven by Spatial Audio
Shuyang Xu
Zhiyang Dou
Mingyi Shi
Liang Pan
Leo Ho
...
Yuan Liu
Cheng Lin
Y. Ma
Wenping Wang
Taku Komura
286
11
0
16 Jul 2025
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
488
2
0
18 Jun 2025
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Jiamin Xie
Ju Lin
Yiteng Huang
Tyler Vuong
Zhaojiang Lin
...
Peng Su
Prashant Rawat
Sangeeta Srivastava
Ming Sun
Florian Metze
197
6
0
17 Jun 2025
GRAM: Spatial general-purpose audio representation models for real-world applications
GRAM: Spatial general-purpose audio representation models for real-world applications
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
423
1
0
01 Jun 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
755
9
0
29 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
395
4
0
11 Apr 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
630
10
0
10 Jan 2025
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning
  using Multi-Round Preference Optimization
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Tianhao Shen
Chao Zhang
316
8
0
09 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human PreferencesInternational Conference on Learning Representations (ICLR), 2024
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
609
17
0
03 Oct 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLMLM&MA
440
25
0
25 Sep 2024
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Maria Sandsten
B. Schuller
475
11
0
22 Jul 2024
Can Large Language Models Understand Spatial Audio?
Can Large Language Models Understand Spatial Audio?
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
...
Jun Zhang
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
431
20
0
12 Jun 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
  Capabilities
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning CapabilitiesComputer Vision and Pattern Recognition (CVPR), 2024
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas Guibas
Fei Xia
LRMReLM
414
714
0
22 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
377
90
0
07 Jan 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
449
700
0
14 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MAAuLLM
490
529
0
20 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAGVLMAuLLMLM&MA
538
105
0
07 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
12.4K
16,448
0
18 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the WorldInternational Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
585
1,151
0
26 Jun 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes
  with Spatiotemporal Annotations of Sound Events
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound EventsNeural Information Processing Systems (NeurIPS), 2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
D. Krause
Kengo Uchida
...
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Maria Sandsten
Yuki Mitsufuji
327
98
0
15 Jun 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio TasksNeural Information Processing Systems (NeurIPS), 2023
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLMAuLLM
528
268
0
19 May 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
834
241
0
18 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Shiyang Feng
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
372
734
0
28 Apr 2023
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
  Head
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking HeadAAAI Conference on Artificial Intelligence (AAAI), 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
...
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
LM&MAAuLLM
285
376
0
25 Apr 2023
Visual Instruction Tuning
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
1.4K
8,828
0
17 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
20.2K
19,316
0
27 Feb 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic TokenizersInternational Conference on Machine Learning (ICML), 2022
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
550
561
0
18 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak SupervisionInternational Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.4K
6,745
0
06 Dec 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
559
892
0
07 Sep 2022
Masked Autoencoders that Listen
Masked Autoencoders that ListenNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
673
424
0
13 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
402
124
0
16 Jun 2022
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office
  Environment
L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office EnvironmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
E. Guizzo
Christian Marinoni
Marco Pennese
Xinlei Ren
Xiguang Zheng
Chen Zhang
Bruno Masiero
A. Uncini
Danilo Comminiello
324
59
0
21 Feb 2022
SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound
  Source Localization
SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source LocalizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bing Yang
Hong Liu
Xiaofei Li
257
37
0
16 Feb 2022
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ VideosIEEE International Conference on Computer Vision (ICCV), 2021
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
367
124
0
11 Oct 2021
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
269
143
0
03 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation
  for Sound Event Localization and Detection
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
385
127
0
29 Oct 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and SoundComputer Vision and Pattern Recognition (CVPR), 2020
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
256
88
0
11 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
959
3,981
0
16 May 2020
The LOCATA Challenge: Acoustic Source Localization and Tracking
The LOCATA Challenge: Acoustic Source Localization and TrackingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
C. Evers
Heinrich W. Löllmann
H. Mellmann
Alexander Schmidt
Hendrik Barfuss
Patrick A. Naylor
Walter Kellermann
297
162
0
03 Sep 2019
12
Next
Page 1 of 2