X-Linear Attention Networks for Image Captioning

Computer Vision and Pattern Recognition (CVPR), 2020

31 March 2020

Yingwei Pan

Ting Yao

Yehao Li

Tao Mei

ArXiv (abs)PDF HTML Github (274★)

Papers citing "X-Linear Attention Networks for Image Captioning"

50 / 213 papers shown

Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction

05 Dec 2025

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioningAAAI Conference on Artificial Intelligence (AAAI), 2025

154

01 Dec 2025

DescribeEarth: Describe Anything for Remote Sensing Images

165

30 Sep 2025

RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning

119

10 Aug 2025

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

L. D. M. S. Sai Teja

Ashok Urlana

Pruthwik Mishra

143

09 Aug 2025

On Explaining Visual Captioning with Hybrid Markov Logic Networks

191

28 Jul 2025

Efficiency Robustness of Dynamic Deep Learning Systems

Ravishka Rathnasuriya

386

12 Jun 2025

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025

266

03 Jun 2025

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention FusionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

410

19 May 2025

DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report GenerationComputer Vision and Pattern Recognition (CVPR), 2025

279

16 Apr 2025

Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024

436

03 Apr 2025

Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov LogicBigData Congress [Services Society] (BSS), 2024

335

18 Mar 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Sara Sarto

Marcella Cornia

Rita Cucchiara

463

18 Mar 2025

SuperCap: Multi-resolution Superpixel-based Image Captioning

321

11 Mar 2025

AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

410

03 Mar 2025

Performance Analysis of Traditional VQA Models Under Limited Computational Resources

Jihao Gu

320

09 Feb 2025

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024

315

03 Jan 2025

CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs

259

03 Dec 2024

ORID: Organ-Regional Information Driven Framework for Radiology Report GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

398

20 Nov 2024

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model

1.2K

19 Nov 2024

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024

314

09 Oct 2024

Pixels to Prose: Understanding the art of Image Captioning

Hrishikesh Singh

Aarti Sharma

Millie Pant

3DV VLM

235

28 Aug 2024

TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model

Yawen Cui

202

22 Aug 2024

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis

479

09 Aug 2024

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual CuesEuropean Conference on Computer Vision (ECCV), 2024

Sara Sarto

Marcella Cornia

Lorenzo Baraldi

Rita Cucchiara

240

29 Jul 2024

Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

323

16 Jul 2024

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

294

07 Jul 2024

Image Captioning via Dynamic Path Customization

Jiayi Ji

Yongjian Wu

302

01 Jun 2024

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

258

28 May 2024

Towards Retrieval-Augmented Architectures for Image Captioning

Lorenzo Baraldi

251

21 May 2024

FITA: Fine-grained Image-Text Aligner for Radiology Report Generation

230

02 May 2024

Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach

Zhilin Zhang

Fangyu Wu

214

01 May 2024

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Abhilash Nandy

199

27 Apr 2024

Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation

262

31 Mar 2024

Text Data-Centric Image Captioning with Interactive Prompts

Fan Wang

267

28 Mar 2024

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

225

12 Mar 2024

How to Understand Named Entities: Using Common Sense for News CaptioningACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024

217

11 Mar 2024

Transformer based Multitask Learning for Image Captioning and Object DetectionPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024

Debolena Basak

P. K. Srijith

M. Desarkar

210

10 Mar 2024

MeaCap: Memory-Augmented Zero-shot Image Captioning

331

06 Mar 2024

Social Media Ready Caption Generation for Brands

Himanshu Maheshwari

Koustava Goswami

Apoorv Saxena

Balaji Vasan Srinivasan

193

03 Jan 2024

Cycle-Consistency Learning for Captioning and Grounding

273

23 Dec 2023

Improving Image Captioning via Predicting Structured ConceptsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ting Wang

Weidong Chen

Yuanhe Tian

Yan Song

Zhendong Mao

242

14 Nov 2023

Complex Organ Mask Guided Radiology Report GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Dongnan Liu

308

04 Nov 2023

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image AnalysismedRxiv (medRxiv), 2023

Yingshu Li

Yunyi Liu

Zhanyu Wang

Xinyu Liang

Lei Wang

Lingqiao Liu

Leyang Cui

352

31 Oct 2023

Semi-Supervised Panoptic Narrative GroundingACM Multimedia (ACM MM), 2023

Jiayi Ji

240

27 Oct 2023

C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network

215

09 Oct 2023

Open-Vocabulary Animal Keypoint Detection with Semantic-feature MatchingInternational Journal of Computer Vision (IJCV), 2023

Ping Luo

Yu Qiao

Kaipeng Zhang

ObjD VLM

337

08 Oct 2023

Towards Answering Health-related Questions from Medical Videos: Datasets and ApproachesInternational Conference on Language Resources and Evaluation (LREC), 2023

179

21 Sep 2023

R2GenGPT: Radiology Report Generation with Frozen LLMs

Lingqiao Liu

253

164

18 Sep 2023

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical LearningComputer Vision and Pattern Recognition (CVPR), 2023

Qi Wu

215

05 Sep 2023