ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.07331
  4. Cited By
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by
  Audio-based Similar Caption Retrieval

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

14 December 2020
Yuma Koizumi
Yasunori Ohishi
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
ArXiv (abs)PDFHTML

Papers citing "Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval"

38 / 38 papers shown
Jamendo-QA: A Large-Scale Music Question Answering Dataset
Jamendo-QA: A Large-Scale Music Question Answering Dataset
Junyoung Koh
Soo Yong Kim
Yongwon Choi
Gyu Hyeong Choi
169
0
0
19 Sep 2025
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan
Pratik Patel
Sahil Tripathi
Md Azizul Hoque
Gautam Siddharth Kashyap
VLM
133
1
0
16 Sep 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
237
2
0
01 Jun 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
VDocRAG: Retrieval-Augmented Generation over Visually-Rich DocumentsComputer Vision and Pattern Recognition (CVPR), 2025
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
380
33
0
14 Apr 2025
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-EmbeddingsComputer Vision and Pattern Recognition (CVPR), 2025
Aayush Dhakal
Srikumar Sastry
Subash Khanal
Adeel Ahmad
Eric Xing
Nathan Jacobs
455
7
0
27 Feb 2025
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifu Chen
Shengpeng Ji
Haoxiao Wang
Liang Luo
Siyu Chen
Jinzheng He
Jin Xu
Zhou Zhao
AuLLMVLM
373
12
0
21 Feb 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
548
10
0
10 Jan 2025
R^2AG: Incorporating Retrieval Information into Retrieval Augmented
  Generation
R^2AG: Incorporating Retrieval Information into Retrieval Augmented GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Fuda Ye
Shuangyin Li
Yongqi Zhang
Lei Chen
247
1
0
19 Jun 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Tengjiao Wang
3DV
1.1K
512
0
29 Feb 2024
Intelligent Director: An Automatic Framework for Dynamic Visual
  Composition using ChatGPT
Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
Sixiao Zheng
Jingyang Huo
Yu Wang
Yanwei Fu
VGenDiffM
181
1
0
24 Feb 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIPVLM
249
45
0
31 Jan 2024
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
  Blender-Oriented GPT Planning
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv
Yi Huang
Mingfu Yan
Jiancheng Huang
Jianzhuang Liu
Yifan Liu
Yafei Wen
Xiaoxin Chen
Shifeng Chen
VGenDiffM
432
57
0
21 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
255
15
0
14 Nov 2023
RECAP: Retrieval-Augmented Audio Captioning
RECAP: Retrieval-Augmented Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Tianyi Zhou
VLM
288
36
0
18 Sep 2023
Synth-AC: Enhancing Audio Captioning with Synthetic Supervision
Synth-AC: Enhancing Audio Captioning with Synthetic Supervision
Feiyang Xiao
Qiaoxi Zhu
Jian Guan
Xubo Liu
Haohe Liu
Kejia Zhang
Wenwu Wang
197
2
0
18 Sep 2023
Training Audio Captioning Models without Audio
Training Audio Captioning Models without AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
225
28
0
14 Sep 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
246
11
0
23 Aug 2023
Improving Audio Caption Fluency with Automatic Error Correction
Improving Audio Caption Fluency with Automatic Error Correction
Hanxue Zhang
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
148
0
0
16 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event
  Detection
Enhance Temporal Relations in Audio Captioning with Sound Event DetectionInterspeech (Interspeech), 2023
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
253
16
0
02 Jun 2023
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot
  Text-to-Video Generation
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation
Susung Hong
Junyoung Seo
Heeseong Shin
Sung‐Jin Hong
Seung Wook Kim
DiffMVGen
330
56
0
23 May 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
773
230
0
18 May 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
257
7
0
06 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
428
53
0
30 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
453
35
0
20 Mar 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
Retrieving Multimodal Information for Augmented Generation: A SurveyConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ruochen Zhao
Hailin Chen
Weishi Wang
Fangkai Jiao
Do Xuan Long
...
Bosheng Ding
Xiaobao Guo
Minzhi Li
Xingxuan Li
Shafiq Joty
450
135
0
20 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic DataACM Multimedia (ACM MM), 2023
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
463
23
0
14 Mar 2023
Automated Audio Captioning with Epochal Difficult Captions for
  Curriculum Learning
Automated Audio Captioning with Epochal Difficult Captions for Curriculum LearningAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
151
1
0
04 Jun 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New ChallengesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
385
55
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
302
24
0
11 May 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio CaptioningEuropean Signal Processing Conference (EUSIPCO), 2022
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
295
33
0
06 Mar 2022
Automatic Audio Captioning using Attention weighted Event based
  Embeddings
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
201
0
0
28 Jan 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
327
119
0
17 Dec 2021
Evaluating Off-the-Shelf Machine Listening and Natural Language Models
  for Automated Audio Captioning
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
158
9
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffMGAN
337
36
0
13 Oct 2021
Automated Audio Captioning using Transfer Learning and Reconstruction
  Latent Space Similarity Regularization
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity RegularizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Andrew Koh
Fuzhao Xue
Chng Eng Siong
215
22
0
10 Aug 2021
Audio Captioning Transformer
Audio Captioning TransformerWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
232
91
0
21 Jul 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language QueriesInterspeech (Interspeech), 2021
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
364
84
0
05 May 2021
MusCaps: Generating Captions for Music Audio
MusCaps: Generating Captions for Music AudioIEEE International Joint Conference on Neural Network (IJCNN), 2021
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
353
45
0
24 Apr 2021
1
Page 1 of 1