Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.11457
Cited By
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
23 February 2021
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning"
41 / 41 papers shown
Title
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
Zeyu Xie
Chenxing Li
Xuenan Xu
Mengyue Wu
Wenfu Wang
Ruibo Fu
Meng Yu
Dong Yu
Yuexian Zou
100
0
0
29 Sep 2025
Discrete Audio Representations for Automated Audio Captioning
Jingguang Tian
Haoqin Sun
Xinhui Hu
Xinkang Xu
178
1
0
21 May 2025
Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
M. Gauy
Natalia Hitomi Koza
Ricardo Mikio Morita
Gabriel Rocha Stanzione
Arnaldo Cândido Júnior
L. Berti
A. S. Levin
E. Sabino
F. Svartman
Marcelo Finger
108
1
0
30 Jul 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Xuenan Xu
Haohe Liu
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
226
4
0
19 Jul 2024
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
135
14
0
14 Nov 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Avamarie Brueggeman
Andrea Madotto
Mohammad Kachuee
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
254
109
0
27 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
164
10
0
21 Sep 2023
RECAP: Retrieval-Augmented Audio Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Tianyi Zhou
VLM
170
33
0
18 Sep 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
243
54
0
30 Jul 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Interspeech (Interspeech), 2023
Yifei Xin
Yuexian Zou
233
9
0
28 Jul 2023
Improving Audio Caption Fluency with Automatic Error Correction
Hanxue Zhang
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
108
0
0
16 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection
Interspeech (Interspeech), 2023
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
204
15
0
02 Jun 2023
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Interspeech (Interspeech), 2023
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
139
4
0
30 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Neural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
434
167
0
29 May 2023
Listen, Think, and Understand
International Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
441
217
0
18 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
294
145
0
17 Apr 2023
Prefix tuning for automated audio captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
252
51
0
30 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
280
34
0
20 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
ACM Multimedia (ACM MM), 2023
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
363
20
0
14 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
210
4
0
05 Dec 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Interspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
378
25
0
28 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
152
3
0
10 Oct 2022
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
98
0
0
12 Aug 2022
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
85
1
0
04 Jun 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
EURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
208
53
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
237
19
0
11 May 2022
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
98
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
168
3
0
18 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
161
2
0
31 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
149
24
0
29 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
European Signal Processing Conference (EUSIPCO), 2022
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
239
32
0
06 Mar 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
136
0
0
28 Jan 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
157
118
0
17 Dec 2021
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
115
9
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
GAN
243
33
0
13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
130
30
0
12 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
117
57
0
10 Oct 2021
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Andrew Koh
Fuzhao Xue
Chng Eng Siong
103
22
0
10 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
157
59
0
05 Aug 2021
Audio Captioning Transformer
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
149
88
0
21 Jul 2021
Audio Retrieval with Natural Language Queries
Interspeech (Interspeech), 2021
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
159
82
0
05 May 2021
1