Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.05949
Cited By
v1
v2 (latest)
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
EURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
12 May 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Automated Audio Captioning: An Overview of Recent Progress and New Challenges"
25 / 25 papers shown
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Kentaro Seki
Yuki Okamoto
Kouei Yamaoka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
113
0
0
18 Sep 2025
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan
Pratik Patel
Sahil Tripathi
Md Azizul Hoque
Gautam Siddharth Kashyap
VLM
109
0
0
16 Sep 2025
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita
Tomoya Mizumoto
Atsushi Kojima
Lianbo Liu
Yui Sudo
AuLLM
292
0
0
12 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
189
2
0
01 Jun 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
293
17
0
11 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
469
6
0
10 Jan 2025
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
John H. L. Hansen
229
3
0
25 Jul 2024
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Xin Jing
Andreas Triantafyllopoulos
Björn Schuller
153
11
0
11 Jun 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
253
2
0
17 May 2024
ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
Gijs Wijngaard
Elia Formisano
Bruno L. Giordano
M. Dumontier
214
5
0
27 Mar 2024
EDTC: enhance depth of text comprehension in automated audio captioning
Liwen Tan
Yin Cao
Yi Zhou
207
0
0
27 Feb 2024
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
IEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
349
15
0
11 Oct 2023
A Large-scale Dataset for Audio-Language Representation Learning
ACM Multimedia (ACM MM), 2023
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
373
47
0
20 Sep 2023
Audio Difference Learning for Audio Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tatsuya Komatsu
Yusuke Fujita
K. Takeda
Tomoki Toda
184
7
0
15 Sep 2023
Separate Anything You Describe
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
314
70
0
09 Aug 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Interspeech (Interspeech), 2023
Yifei Xin
Yuexian Zou
392
9
0
28 Jul 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
European Signal Processing Conference (EUSIPCO), 2023
Etienne Labbé
J. Pinquier
Thomas Pellegrini
211
5
0
02 May 2023
Graph Attention for Automated Audio Captioning
IEEE Signal Processing Letters (IEEE SPL), 2023
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
200
11
0
07 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
337
311
0
30 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
296
6
0
05 Dec 2022
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
202
1
0
18 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
249
3
0
12 Nov 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Interspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
409
25
0
28 Oct 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
283
23
0
11 May 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
IEEE Signal Processing Letters (SPL), 2022
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
277
13
0
10 Jan 2022
1
Page 1 of 1