ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.07365
9
0

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

12 May 2025
Chao-Han Huck Yang
Sreyan Ghosh
Qing Wang
Jaeyeon Kim
Hengyi Hong
Sonal Kumar
Guirui Zhong
Zhifeng Kong
S. Sakshi
Vaibhavi Lokegaonkar
Oriol Nieto
R. Duraiswami
Dinesh Manocha
Gunhee Kim
Jun Du
Rafael Valle
Bryan Catanzaro
ArXivPDFHTML
Abstract

We present Task 5 of the DCASE 2025 Challenge: an Audio Question Answering (AQA) benchmark spanning multiple domains of sound understanding. This task defines three QA subsets (Bioacoustics, Temporal Soundscapes, and Complex QA) to test audio-language models on interactive question-answering over diverse acoustic scenes. We describe the dataset composition (from marine mammal calls to soundscapes and complex real-world clips), the evaluation protocol (top-1 accuracy with answer-shuffling robustness), and baseline systems (Qwen2-Audio-7B, AudioFlamingo 2, Gemini-2-Flash). Preliminary results on the development set are compared, showing strong variation across models and subsets. This challenge aims to advance the audio understanding and reasoning capabilities of audio-language models toward human-level acuity, which are crucial for enabling AI agents to perceive and interact about the world effectively.

View on arXiv
@article{yang2025_2505.07365,
  title={ Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge },
  author={ Chao-Han Huck Yang and Sreyan Ghosh and Qing Wang and Jaeyeon Kim and Hengyi Hong and Sonal Kumar and Guirui Zhong and Zhifeng Kong and S Sakshi and Vaibhavi Lokegaonkar and Oriol Nieto and Ramani Duraiswami and Dinesh Manocha and Gunhee Kim and Jun Du and Rafael Valle and Bryan Catanzaro },
  journal={arXiv preprint arXiv:2505.07365},
  year={ 2025 }
}
Comments on this paper