Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.06954
Cited By
v1
v2 (latest)
Attention on Attention for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2019
19 August 2019
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
Re-assign community
ArXiv (abs)
PDF
HTML
Github (333★)
Papers citing
"Attention on Attention for Image Captioning"
50 / 325 papers shown
Nexus: Higher-Order Attention Mechanisms in Transformers
Hanting Chen
Chong Zhu
Kai Han
Yuchuan Tian
Yuchen Liang
Tianyu Guo
Xinghao Chen
Dacheng Tao
Yunhe Wang
389
0
0
03 Dec 2025
Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling
Jiale Liu
Haoming Zhou
Yishu Zhu
Bingzhi Chen
Yuncheng Jiang
203
0
0
11 Nov 2025
DescribeEarth: Describe Anything for Remote Sensing Images
Kaiyu Li
Zixuan Jiang
Xiangyong Cao
Jiayu Wang
Yuchen Xiao
Deyu Meng
Zhi Wang
180
2
0
30 Sep 2025
Diff-3DCap: Shape Captioning with Diffusion Models
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Zhenyu Shu
Jiawei Wen
Shiyang Li
Shiqing Xin
Ligang Liu
DiffM
171
0
0
28 Sep 2025
Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning
Riad Ahmed Anonto
Sardar Md. Saffat Zabin
M. Saifur Rahman
VLM
156
1
0
22 Sep 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
134
0
0
10 Aug 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
155
0
0
09 Aug 2025
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
275
0
0
31 Jul 2025
On Explaining Visual Captioning with Hybrid Markov Logic Networks
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
VLM
200
0
0
28 Jul 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Computer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
286
9
0
03 Jun 2025
Panoptic Captioning: An Equivalence Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
738
0
0
22 May 2025
Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation
Lakshita Agarwal
Bindu Verma
ViT
211
0
0
23 Apr 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
406
0
0
23 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
International Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
479
2
0
03 Apr 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
BigData Congress [Services Society] (BSS), 2024
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLM
BDL
VLM
345
1
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
492
14
0
18 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
334
0
0
11 Mar 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
302
2
0
06 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
431
0
0
03 Mar 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
European Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
334
3
0
03 Jan 2025
Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation
IEEE Transactions on Image Processing (TIP), 2024
Yang Yang
Wenjuan Xi
Luping Zhou
Jinhui Tang
324
7
0
14 Dec 2024
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
419
6
0
20 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
International Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
334
13
0
09 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Asian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
195
6
0
06 Oct 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Joshua Forster Feinglass
Yezhou Yang
232
0
0
30 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Xin Jiang
Junwei Zheng
Ruiping Liu
Jiahang Li
Jiaming Zhang
Sven Matthiesen
Rainer Stiefelhagen
VLM
273
4
0
21 Sep 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
250
3
0
28 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
British Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
389
7
0
26 Aug 2024
Shifted Window Fourier Transform And Retention For Image Captioning
International Conference on Neural Information Processing (ICONIP), 2024
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
VLM
352
2
0
25 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
499
0
0
09 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
European Conference on Computer Vision (ECCV), 2024
Xianyu Chen
Ming Jiang
Qi Zhao
262
9
0
05 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
European Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
257
12
0
29 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
346
25
0
21 Jul 2024
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
Bo Yuan
Danpei Zhao
Zhuoran Liu
Wentao Li
Tian Li
CLL
VLM
438
5
0
19 Jul 2024
EFCNet: Every Feature Counts for Small Medical Object Segmentation
Lingjie Kong
Qiaoling Wei
Chengming Xu
Han Chen
Yanwei Fu
246
1
0
26 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
300
16
0
09 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
310
10
0
01 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
262
20
0
21 May 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge
Fangyin Wei
Siddharth Gururani
Nayeon Lee
Xuan Li
Huayu Chen
CoGe
DiffM
238
37
0
30 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
204
9
0
19 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
329
24
0
12 Apr 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
306
3
0
28 Mar 2024
A Survey on Large Language Models from Concept to Implementation
Chen Wang
Jin Zhao
Jiaqi Gong
LLMAG
LM&MA
448
8
0
27 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
336
0
0
26 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
241
19
0
12 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
227
0
0
11 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
345
55
0
06 Mar 2024
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition
Yutian Liu
Wenjun Ke
Jianguo Wei
350
1
0
04 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
243
49
0
28 Feb 2024
EDTC: enhance depth of text comprehension in automated audio captioning
Liwen Tan
Yin Cao
Yi Zhou
228
0
0
27 Feb 2024
1
2
3
4
5
6
7
Next
Page 1 of 7