ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11457
  4. Cited By
Investigating Local and Global Information for Automated Audio
  Captioning with Transfer Learning

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
23 February 2021
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
ArXiv (abs)PDFHTML

Papers citing "Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning"

41 / 41 papers shown
Title
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
Zeyu Xie
Chenxing Li
Xuenan Xu
Mengyue Wu
Wenfu Wang
Ruibo Fu
Meng Yu
Dong Yu
Yuexian Zou
100
0
0
29 Sep 2025
Discrete Audio Representations for Automated Audio Captioning
Discrete Audio Representations for Automated Audio Captioning
Jingguang Tian
Haoqin Sun
Xinhui Hu
Xinkang Xu
178
1
0
21 May 2025
Contrasting Deep Learning Models for Direct Respiratory Insufficiency
  Detection Versus Blood Oxygen Saturation Estimation
Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
M. Gauy
Natalia Hitomi Koza
Ricardo Mikio Morita
Gabriel Rocha Stanzione
Arnaldo Cândido Júnior
L. Berti
A. S. Levin
E. Sabino
F. Svartman
Marcelo Finger
108
1
0
30 Jul 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Xuenan Xu
Haohe Liu
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
226
4
0
19 Jul 2024
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
135
14
0
14 Nov 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Avamarie Brueggeman
Andrea Madotto
Mohammad Kachuee
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
254
109
0
27 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
164
10
0
21 Sep 2023
RECAP: Retrieval-Augmented Audio Captioning
RECAP: Retrieval-Augmented Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Tianyi Zhou
VLM
170
33
0
18 Sep 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
243
54
0
30 Jul 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary CaptionsInterspeech (Interspeech), 2023
Yifei Xin
Yuexian Zou
233
9
0
28 Jul 2023
Improving Audio Caption Fluency with Automatic Error Correction
Improving Audio Caption Fluency with Automatic Error Correction
Hanxue Zhang
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
108
0
0
16 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event
  Detection
Enhance Temporal Relations in Audio Captioning with Sound Event DetectionInterspeech (Interspeech), 2023
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
204
15
0
02 Jun 2023
Dual Transformer Decoder based Features Fusion Network for Automated
  Audio Captioning
Dual Transformer Decoder based Features Fusion Network for Automated Audio CaptioningInterspeech (Interspeech), 2023
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
139
4
0
30 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetNeural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
434
167
0
29 May 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
441
217
0
18 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
294
145
0
17 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
252
51
0
30 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
280
34
0
20 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic DataACM Multimedia (ACM MM), 2023
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
363
20
0
14 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
Towards Generating Diverse Audio Captions via Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
210
4
0
05 Dec 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionInterspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
378
25
0
28 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional
  Features
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
152
3
0
10 Oct 2022
An investigation on selecting audio pre-trained models for audio
  captioning
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
98
0
0
12 Aug 2022
Automated Audio Captioning with Epochal Difficult Captions for
  Curriculum Learning
Automated Audio Captioning with Epochal Difficult Captions for Curriculum LearningAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
85
1
0
04 Jun 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New ChallengesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
208
53
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
237
19
0
11 May 2022
Automated Audio Captioning using Audio Event Clues
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
98
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
168
3
0
18 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
161
2
0
31 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
149
24
0
29 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio CaptioningEuropean Signal Processing Conference (EUSIPCO), 2022
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
239
32
0
06 Mar 2022
Automatic Audio Captioning using Attention weighted Event based
  Embeddings
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
136
0
0
28 Jan 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
157
118
0
17 Dec 2021
Evaluating Off-the-Shelf Machine Listening and Natural Language Models
  for Automated Audio Captioning
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
115
9
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffMGAN
243
33
0
13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic InformationWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
130
30
0
12 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
Can Audio Captions Be Evaluated with Image Caption Metrics?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
117
57
0
10 Oct 2021
Automated Audio Captioning using Transfer Learning and Reconstruction
  Latent Space Similarity Regularization
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity RegularizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Andrew Koh
Fuzhao Xue
Chng Eng Siong
103
22
0
10 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and
  Reinforcement Learning
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement LearningWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
157
59
0
05 Aug 2021
Audio Captioning Transformer
Audio Captioning TransformerWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
149
88
0
21 Jul 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language QueriesInterspeech (Interspeech), 2021
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
159
82
0
05 May 2021
1