ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.06691
  4. Cited By
Diverse Audio Captioning via Adversarial Training
v1v2 (latest)

Diverse Audio Captioning via Adversarial Training

13 October 2021
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
    DiffMGAN
ArXiv (abs)PDFHTML

Papers citing "Diverse Audio Captioning via Adversarial Training"

20 / 20 papers shown
Title
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
63
0
0
03 Aug 2025
Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
Yang Liu
Xinming Zhang
Jiale Du
Xinbo Gao
Jungong Han
OODD
163
0
0
28 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
187
16
0
11 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
314
6
0
10 Jan 2025
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Xuenan Xu
Haohe Liu
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
150
4
0
19 Jul 2024
AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock
  Volatility Prediction
AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction
Shengkun Wang
Taoran Ji
Jianfeng He
Mariam Almutairi
Dan Wang
Linhan Wang
Min Zhang
Chang-Tien Lu
100
4
0
03 Jul 2024
On the Audio Hallucinations in Large Audio-Video Language Models
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
126
11
0
18 Jan 2024
Training Audio Captioning Models without Audio
Training Audio Captioning Models without AudioIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
135
24
0
14 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple
  datasets with Task Embedding
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
165
17
0
01 Sep 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
144
8
0
23 Aug 2023
Dual Transformer Decoder based Features Fusion Network for Automated
  Audio Captioning
Dual Transformer Decoder based Features Fusion Network for Automated Audio CaptioningInterspeech (Interspeech), 2023
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
111
3
0
30 May 2023
Towards Generating Diverse Audio Captions via Adversarial Training
Towards Generating Diverse Audio Captions via Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
159
4
0
05 Dec 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionInterspeech (Interspeech), 2022
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
258
25
0
28 Oct 2022
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption
  Similarity
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
102
1
0
03 Oct 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New ChallengesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
164
52
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
169
19
0
11 May 2022
Automated Audio Captioning using Audio Event Clues
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
90
0
0
18 Apr 2022
Separate What You Describe: Language-Queried Audio Source Separation
Separate What You Describe: Language-Queried Audio Source SeparationInterspeech (Interspeech), 2022
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Jinzheng Zhao
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
164
83
0
28 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio CaptioningEuropean Signal Processing Conference (EUSIPCO), 2022
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
175
32
0
06 Mar 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Local Information Assisted Attention-free Decoder for Audio CaptioningIEEE Signal Processing Letters (SPL), 2022
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
153
12
0
10 Jan 2022
1