ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00370
  4. Cited By
Improved Image Captioning via Policy Gradient optimization of SPIDEr
v1v2v3v4 (latest)

Improved Image Captioning via Policy Gradient optimization of SPIDEr

1 December 2016
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
ArXiv (abs)PDFHTML

Papers citing "Improved Image Captioning via Policy Gradient optimization of SPIDEr"

50 / 232 papers shown
A request for clarity over the End of Sequence token in the
  Self-Critical Sequence Training
A request for clarity over the End of Sequence token in the Self-Critical Sequence TrainingInternational Conference on Image Analysis and Processing (ICIAP), 2023
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
257
7
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
205
11
0
20 May 2023
BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases
BOLT: Fast Energy-based Controlled Text Generation with Tunable BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xin Liu
Muhammad Khalifa
Lu Wang
328
23
0
19 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizerEuropean Signal Processing Conference (EUSIPCO), 2023
Etienne Labbé
J. Pinquier
Thomas Pellegrini
205
5
0
02 May 2023
Towards Explainable and Safe Conversational Agents for Mental Health: A
  Survey
Towards Explainable and Safe Conversational Agents for Mental Health: A Survey
Surjodeep Sarkar
Manas Gaur
L. Chen
Muskan Garg
Biplav Srivastava
B. Dongaonkar
AI4MH
158
4
0
25 Apr 2023
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
A Cubic-regularized Policy Newton Algorithm for Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Mizhaan Prajit Maniyar
Akash Mondal
Prashanth L.A.
S. Bhatnagar
183
4
0
21 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio CaptioningIEEE Signal Processing Letters (IEEE SPL), 2023
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
197
11
0
07 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
353
53
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal ResearchIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
337
306
0
30 Mar 2023
ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration
  Systems for Blind and Low Vision Users
ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision UsersInternational Conference on Human Factors in Computing Systems (CHI), 2023
Vishnu Nair
Han Zhu
Brian A. Smith
158
27
0
17 Feb 2023
Semantics-Empowered Communication: A Tutorial-cum-Survey
Semantics-Empowered Communication: A Tutorial-cum-Survey
Zhilin Lu
Rongpeng Li
Kun Lu
Xianfu Chen
Ekram Hossain
Zhifeng Zhao
Honggang Zhang
528
23
0
16 Dec 2022
Impact of visual assistance for automated audio captioning
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
192
1
0
18 Nov 2022
Is my automatic audio captioning system so bad? spider-max: a metric to
  consider several caption candidates
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidatesWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2022
Etienne Labbé
Thomas Pellegrini
J. Pinquier
106
5
0
14 Nov 2022
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Eungbeom Kim
Jinhee Kim
Yoori Oh
Kyungsu Kim
Minju Park
Jaeheon Sim
J. Lee
Kyogu Lee
167
16
0
31 Oct 2022
Hybrid Reinforced Medical Report Generation with M-Linear Attention and
  Repetition Penalty
Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition PenaltyIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Wenting Xu
Zhenghua Xu
Junyang Chen
Chang Qi
Thomas Lukasiewicz
MedIm
174
15
0
14 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional
  Features
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
176
3
0
10 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
354
38
0
05 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
565
279
0
03 Oct 2022
Paraphrasing Is All You Need for Novel Object Captioning
Paraphrasing Is All You Need for Novel Object CaptioningNeural Information Processing Systems (NeurIPS), 2022
Cheng Yang
Yifan Hao
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
184
6
0
25 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
  in Wikipedia
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in WikipediaAAAI Conference on Artificial Intelligence (AAAI), 2022
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
189
15
0
21 Sep 2022
An investigation on selecting audio pre-trained models for audio
  captioning
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
126
0
0
12 Aug 2022
Is GPT-3 all you need for Visual Question Answering in Cultural
  Heritage?
Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?
P. Bongini
Federico Becattini
Marco Bertini
206
19
0
25 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
Rethinking the Reference-based Distinctive Image CaptioningACM Multimedia (ACM MM), 2022
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
225
23
0
22 Jul 2022
Efficient Modeling of Future Context for Image Captioning
Efficient Modeling of Future Context for Image CaptioningACM Multimedia (ACM MM), 2022
Zhengcong Fei
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
208
16
0
22 Jul 2022
Automated Audio Captioning and Language-Based Audio Retrieval
Automated Audio Captioning and Language-Based Audio Retrieval
Clive Gomes
Hyejin Park
Patrick Kollman
Yi-Zhe Song
Iffanice Houndayi
Ankit Parag Shah
297
1
0
08 Jul 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New ChallengesEURASIP Journal on Audio, Speech, and Music Processing (EURASIP J. Audio Speech Music Process.), 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
290
54
0
12 May 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
202
3
0
18 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language TasksIEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
153
57
0
16 Apr 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
188
24
0
29 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio CaptioningEuropean Signal Processing Conference (EUSIPCO), 2022
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
267
32
0
06 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image CaptioningInternational Conference on Pattern Recognition (ICPR), 2022
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViTVLM
194
37
0
21 Feb 2022
Joint Speech Recognition and Audio Captioning
Joint Speech Recognition and Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
130
10
0
03 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A ReviewACM Computing Surveys (ACM CSUR), 2022
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
480
150
0
31 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Local Information Assisted Attention-free Decoder for Audio CaptioningIEEE Signal Processing Letters (SPL), 2022
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
270
13
0
10 Jan 2022
A Survey of Natural Language Generation
A Survey of Natural Language GenerationACM Computing Surveys (CSUR), 2021
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
336
63
0
22 Dec 2021
Evaluating Off-the-Shelf Machine Listening and Natural Language Models
  for Automated Audio Captioning
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
140
9
0
14 Oct 2021
Audio Captioning Using Sound Event Detection
Audio Captioning Using Sound Event Detection
Aycsegul Ozkaya Eren
M. Sert
168
8
0
04 Oct 2021
CIDEr-R: Robust Consensus-based Image Description Evaluation
CIDEr-R: Robust Consensus-based Image Description Evaluation
G. O. D. Santos
Esther Luna Colombini
Sandra Avila
151
40
0
28 Sep 2021
Reinforcement Learning-powered Semantic Communication via Semantic
  Similarity
Reinforcement Learning-powered Semantic Communication via Semantic Similarity
Kun Lu
Rongpeng Li
Xianfu Chen
Zhifeng Zhao
Honggang Zhang
157
57
0
27 Aug 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report
  Generation With Alternate Learning
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
274
73
0
11 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction
  Latent Space Similarity Regularization
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity RegularizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Andrew Koh
Fuzhao Xue
Chng Eng Siong
129
22
0
10 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and
  Reinforcement Learning
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement LearningWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
182
60
0
05 Aug 2021
Continual Learning for Automated Audio Captioning Using The Learning
  Without Forgetting Approach
Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting ApproachWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021
Jan van den Berg
Konstantinos Drossos
CLL
140
12
0
16 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
435
344
0
14 Jul 2021
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
  Generation
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Guangyi Liu
Zichao Yang
Tianhua Tao
Xiaodan Liang
Junwei Bao
Zhen Li
Bowen Zhou
Shuguang Cui
Zhiting Hu
389
23
0
29 Jun 2021
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption
  Evaluation via Typicality Analysis
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality AnalysisAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Joshua Forster Feinglass
Yezhou Yang
81
24
0
02 Jun 2021
Longer Version for "Deep Context-Encoding Network for Retinal Image
  Captioning"
Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"
Jia-Hong Huang
Ting-Wei Wu
Chao-Han Huck Yang
Marcel Worring
MedIm
160
33
0
30 May 2021
Contextualized Keyword Representations for Multi-modal Retinal Image
  Captioning
Contextualized Keyword Representations for Multi-modal Retinal Image CaptioningInternational Conference on Multimedia Retrieval (ICMR), 2021
Jia-Hong Huang
Ting-Wei Wu
Marcel Worring
MedIm
243
31
0
26 Apr 2021
MusCaps: Generating Captions for Music Audio
MusCaps: Generating Captions for Music AudioIEEE International Joint Conference on Neural Network (IJCNN), 2021
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
281
43
0
24 Apr 2021
Towards Accurate Text-based Image Captioning with Content Diversity
  Exploration
Towards Accurate Text-based Image Captioning with Content Diversity ExplorationComputer Vision and Pattern Recognition (CVPR), 2021
Guanghui Xu
Shuaicheng Niu
Zhuliang Yu
Yucheng Luo
Qing Du
Qi Wu
DiffM
233
67
0
23 Apr 2021
Previous
12345
Next