Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

14 December 2020

Papers citing "Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval"

38 / 38 papers shown

Jamendo-QA: A Large-Scale Music Question Answering Dataset

169

19 Sep 2025

MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models

Gautam Siddharth Kashyap

VLM

133

16 Sep 2025

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer

237

01 Jun 2025

VDocRAG: Retrieval-Augmented Generation over Visually-Rich DocumentsComputer Vision and Pattern Recognition (CVPR), 2025

380

14 Apr 2025

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-EmbeddingsComputer Vision and Pattern Recognition (CVPR), 2025

455

27 Feb 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

373

21 Feb 2025

Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024

548

10 Jan 2025

R^2AG: Incorporating Retrieval Information into Retrieval Augmented GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

247

19 Jun 2024

Retrieval-Augmented Generation for AI-Generated Content: A Survey

1.1K

512

29 Feb 2024

Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT

181

24 Feb 2024

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Jaeyeon Kim

Jaeyoon Jung

Jinjoo Lee

Sang Hoon Woo

CLIP VLM

249

31 Jan 2024

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Jiancheng Huang

Yifan Liu

Shifeng Chen

432

21 Nov 2023

Zero-shot audio captioning with audio-language model guidance and audio context keywords

Leonard Salewski

Stefan Fauth

A. Sophia Koepke

Zeynep Akata

255

14 Nov 2023

RECAP: Retrieval-Augmented Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Sreyan Ghosh

Sonal Kumar

Chandra Kiran Reddy Evuru

R. Duraiswami

Tianyi Zhou

VLM

288

18 Sep 2023

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

197

18 Sep 2023

Training Audio Captioning Models without AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Soham Deshmukh

Benjamin Elizalde

Dimitra Emmanouilidou

Bhiksha Raj

Rita Singh

Huaming Wang

225

14 Sep 2023

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

246

23 Aug 2023

Improving Audio Caption Fluency with Automatic Error Correction

148

16 Jun 2023

Enhance Temporal Relations in Audio Captioning with Sound Event DetectionInterspeech (Interspeech), 2023

253

02 Jun 2023

DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation

330

23 May 2023

Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023

773

230

18 May 2023

Efficient Audio Captioning Transformer with Patchout and Text Guidance

Thodoris Kouzelis

Grigoris Bastas

Athanasios Katsamanis

Alexandros Potamianos

ViT

257

06 Apr 2023

Prefix tuning for automated audio captioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Minkyu Kim

Kim Sung-Bin

Tae-Hyun Oh

428

30 Mar 2023

eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023

453

20 Mar 2023

Retrieving Multimodal Information for Augmented Generation: A SurveyConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Hailin Chen

...

450

135

20 Mar 2023

BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic DataACM Multimedia (ACM MM), 2023

463

14 Mar 2023

Automated Audio Captioning with Epochal Difficult Captions for Curriculum LearningAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022

Andrew Koh

Soham Dinesh Tiwari

Chng Eng Siong

151

04 Jun 2022

Automated Audio Captioning: An Overview of Recent Progress and New ChallengesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022

385

12 May 2022

Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

302

11 May 2022

Leveraging Pre-trained BERT for Audio CaptioningEuropean Signal Processing Conference (EUSIPCO), 2022

295

06 Mar 2022

Automatic Audio Captioning using Attention weighted Event based Embeddings

Swapnil Bhosale

Rupayan Chakraborty

Sunil Kumar Kopparapu

201

28 Jan 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

A. Sophia Koepke

Andreea-Maria Oncescu

João F. Henriques

Zeynep Akata

Samuel Albanie

327

119

17 Dec 2021

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

Benno Weck

Xavier Favory

Konstantinos Drossos

Xavier Serra

158

14 Oct 2021

Diverse Audio Captioning via Adversarial Training

337

13 Oct 2021

Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity RegularizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Andrew Koh

Fuzhao Xue

Chng Eng Siong

215

10 Aug 2021

Audio Captioning TransformerWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021

232

21 Jul 2021

Audio Retrieval with Natural Language QueriesInterspeech (Interspeech), 2021

Andreea-Maria Oncescu

A. Sophia Koepke

João F. Henriques

Zeynep Akata

Samuel Albanie

364

05 May 2021

MusCaps: Generating Captions for Music AudioIEEE International Joint Conference on Neural Network (IJCNN), 2021

353

24 Apr 2021