ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.05949
  4. Cited By
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
v1v2 (latest)

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

EURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
12 May 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
ArXiv (abs)PDFHTML

Papers citing "Automated Audio Captioning: An Overview of Recent Progress and New Challenges"

22 / 22 papers shown
Title
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Kentaro Seki
Yuki Okamoto
Kouei Yamaoka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
69
0
0
18 Sep 2025
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan
Pratik Patel
Sahil Tripathi
Md Azizul Hoque
Gautam Siddharth Kashyap
VLM
77
0
0
16 Sep 2025
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita
Tomoya Mizumoto
Atsushi Kojima
Lianbo Liu
Yui Sudo
AuLLM
233
0
0
12 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
140
1
0
01 Jun 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
231
16
0
11 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
354
6
0
10 Jan 2025
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the EnvironmentIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
John H. L. Hansen
183
2
0
25 Jul 2024
ParaCLAP -- Towards a general language-audio model for computational
  paralinguistic tasks
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Xin Jing
Andreas Triantafyllopoulos
Björn Schuller
114
9
0
11 Jun 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted
  Augmentations
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
202
2
0
17 May 2024
ACES: Evaluating Automated Audio Captioning Models on the Semantics of
  Sounds
ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
Gijs Wijngaard
Elia Formisano
Bruno L. Giordano
M. Dumontier
163
5
0
27 Mar 2024
Audio Difference Learning for Audio Captioning
Audio Difference Learning for Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tatsuya Komatsu
Yusuke Fujita
K. Takeda
Tomoki Toda
143
7
0
15 Sep 2023
Separate Anything You Describe
Separate Anything You DescribeIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
253
69
0
09 Aug 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary CaptionsInterspeech (Interspeech), 2023
Yifei Xin
Yuexian Zou
233
9
0
28 Jul 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizerEuropean Signal Processing Conference (EUSIPCO), 2023
Etienne Labbé
J. Pinquier
Thomas Pellegrini
156
5
0
02 May 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio CaptioningIEEE Signal Processing Letters (IEEE SPL), 2023
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
157
11
0
07 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal ResearchIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
277
290
0
30 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
Towards Generating Diverse Audio Captions via Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
210
4
0
05 Dec 2022
Impact of visual assistance for automated audio captioning
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
113
1
0
18 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and
  Evaluating Suitability of Language-Centric Performance Metrics
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
170
3
0
12 Nov 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionInterspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
378
25
0
28 Oct 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
237
19
0
11 May 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Local Information Assisted Attention-free Decoder for Audio CaptioningIEEE Signal Processing Letters (SPL), 2022
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
239
12
0
10 Jan 2022
1