Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.05949
Cited By
v1
v2 (latest)
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
EURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
12 May 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Automated Audio Captioning: An Overview of Recent Progress and New Challenges"
22 / 22 papers shown
Title
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Kentaro Seki
Yuki Okamoto
Kouei Yamaoka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
69
0
0
18 Sep 2025
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan
Pratik Patel
Sahil Tripathi
Md Azizul Hoque
Gautam Siddharth Kashyap
VLM
77
0
0
16 Sep 2025
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita
Tomoya Mizumoto
Atsushi Kojima
Lianbo Liu
Yui Sudo
AuLLM
233
0
0
12 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
140
1
0
01 Jun 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
231
16
0
11 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
354
6
0
10 Jan 2025
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
John H. L. Hansen
183
2
0
25 Jul 2024
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Xin Jing
Andreas Triantafyllopoulos
Björn Schuller
114
9
0
11 Jun 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
202
2
0
17 May 2024
ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
Gijs Wijngaard
Elia Formisano
Bruno L. Giordano
M. Dumontier
163
5
0
27 Mar 2024
Audio Difference Learning for Audio Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tatsuya Komatsu
Yusuke Fujita
K. Takeda
Tomoki Toda
143
7
0
15 Sep 2023
Separate Anything You Describe
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
253
69
0
09 Aug 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Interspeech (Interspeech), 2023
Yifei Xin
Yuexian Zou
233
9
0
28 Jul 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
European Signal Processing Conference (EUSIPCO), 2023
Etienne Labbé
J. Pinquier
Thomas Pellegrini
156
5
0
02 May 2023
Graph Attention for Automated Audio Captioning
IEEE Signal Processing Letters (IEEE SPL), 2023
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
157
11
0
07 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
277
290
0
30 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
210
4
0
05 Dec 2022
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
113
1
0
18 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
170
3
0
12 Nov 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Interspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
378
25
0
28 Oct 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
237
19
0
11 May 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
IEEE Signal Processing Letters (SPL), 2022
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
239
12
0
10 Jan 2022
1