v1v2 (latest)

Attention on Attention for Image Captioning

IEEE International Conference on Computer Vision (ICCV), 2019

19 August 2019

ArXiv (abs)PDF HTML Github (333★)

Papers citing "Attention on Attention for Image Captioning"

50 / 325 papers shown

Nexus: Higher-Order Attention Mechanisms in Transformers

389

03 Dec 2025

Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling

203

11 Nov 2025

DescribeEarth: Describe Anything for Remote Sensing Images

180

30 Sep 2025

Diff-3DCap: Shape Captioning with Diffusion ModelsIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025

171

28 Sep 2025

Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning

Riad Ahmed Anonto

Sardar Md. Saffat Zabin

M. Saifur Rahman

VLM

156

22 Sep 2025

RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning

134

10 Aug 2025

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

L. D. M. S. Sai Teja

Ashok Urlana

Pruthwik Mishra

155

09 Aug 2025

From Image Captioning to Visual Storytelling

275

31 Jul 2025

On Explaining Visual Captioning with Hybrid Markov Logic Networks

200

28 Jul 2025

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025

286

03 Jun 2025

Panoptic Captioning: An Equivalence Bridge for Image and Text

738

22 May 2025

Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation

Lakshita Agarwal

Bindu Verma

ViT

211

23 Apr 2025

Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism

Lakshita Agarwal

Bindu Verma

ViT

406

23 Apr 2025

Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024

479

03 Apr 2025

Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov LogicBigData Congress [Services Society] (BSS), 2024

345

18 Mar 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Sara Sarto

Marcella Cornia

Rita Cucchiara

492

18 Mar 2025

SuperCap: Multi-resolution Superpixel-based Image Captioning

334

11 Mar 2025

A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning

302

06 Mar 2025

AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

431

03 Mar 2025

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024

334

03 Jan 2025

Rebalanced Vision-Language Retrieval Considering Structure-Aware DistillationIEEE Transactions on Image Processing (TIP), 2024

324

14 Dec 2024

ORID: Organ-Regional Information Driven Framework for Radiology Report GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

419

20 Nov 2024

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024

334

09 Oct 2024

CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024

Devank

Jayateja Kalla

Soma Biswas

195

06 Oct 2024

TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Joshua Forster Feinglass

Yezhou Yang

232

30 Sep 2024

@Bench: Benchmarking Vision-Language Models for Human-centered Assistive TechnologyIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Xin Jiang

Junwei Zheng

Ruiping Liu

Jiahang Li

Jiaming Zhang

Sven Matthiesen

Rainer Stiefelhagen

VLM

273

21 Sep 2024

Pixels to Prose: Understanding the art of Image Captioning

Hrishikesh Singh

Aarti Sharma

Millie Pant

3DV VLM

250

28 Aug 2024

Revisiting Image Captioning Training Paradigm via Direct CLIP-based OptimizationBritish Machine Vision Conference (BMVC), 2024

Lorenzo Baraldi

389

26 Aug 2024

Shifted Window Fourier Transform And Retention For Image CaptioningInternational Conference on Neural Information Processing (ICONIP), 2024

352

25 Aug 2024

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis

499

09 Aug 2024

GazeXplain: Learning to Predict Natural Language Explanations of Visual ScanpathsEuropean Conference on Computer Vision (ECCV), 2024

Xianyu Chen

Ming Jiang

Qi Zhao

262

05 Aug 2024

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual CuesEuropean Conference on Computer Vision (ECCV), 2024

Sara Sarto

Marcella Cornia

Lorenzo Baraldi

Rita Cucchiara

257

29 Jul 2024

HERGen: Elevating Radiology Report Generation with Longitudinal Data

346

21 Jul 2024

Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images

438

19 Jul 2024

EFCNet: Every Feature Counts for Small Medical Object Segmentation

Lingjie Kong

Qiaoling Wei

Chengming Xu

Han Chen

Yanwei Fu

246

26 Jun 2024

Stealthy Targeted Backdoor Attacks against Image CaptioningIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024

300

09 Jun 2024

Image Captioning via Dynamic Path Customization

Jiayi Ji

Yongjian Wu

310

01 Jun 2024

Towards Retrieval-Augmented Architectures for Image Captioning

Lorenzo Baraldi

262

21 May 2024

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

238

30 Apr 2024

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

Weidong Chen

204

19 Apr 2024

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts

Övgü Özdemir

Erdem Akagündüz

329

12 Apr 2024

Text Data-Centric Image Captioning with Interactive Prompts

Fan Wang

306

28 Mar 2024

A Survey on Large Language Models from Concept to Implementation

448

27 Mar 2024

Semi-Supervised Image Captioning Considering Wasserstein Graph Matching

Yang Yang

336

26 Mar 2024

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

241

12 Mar 2024

How to Understand Named Entities: Using Common Sense for News CaptioningACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024

227

11 Mar 2024

MeaCap: Memory-Augmented Zero-shot Image Captioning

345

06 Mar 2024

Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

Yutian Liu

Wenjun Ke

Jianguo Wei

350

04 Mar 2024

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

243

28 Feb 2024

EDTC: enhance depth of text comprehension in automated audio captioning

Liwen Tan

Yin Cao

Yi Zhou

228

27 Feb 2024