Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.14080
Cited By
X-Linear Attention Networks for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2020
31 March 2020
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
Re-assign community
ArXiv (abs)
PDF
HTML
Github (274★)
Papers citing
"X-Linear Attention Networks for Image Captioning"
50 / 213 papers shown
Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction
Ruihong Yin
Xuepeng Shi
Oleksandr Bailo
Marco Manfredi
Theo Gevers
31
0
0
05 Dec 2025
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
AAAI Conference on Artificial Intelligence (AAAI), 2025
Xu Zhang
Jin Yuan
Hanwang Zhang
Guojin Zhong
Yongsheng Zang
Jiacheng Lin
Zhiyong Li
DiffM
VLM
154
1
0
01 Dec 2025
DescribeEarth: Describe Anything for Remote Sensing Images
Kaiyu Li
Zixuan Jiang
Xiangyong Cao
Jiayu Wang
Yuchen Xiao
Deyu Meng
Zhi Wang
165
1
0
30 Sep 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
119
0
0
10 Aug 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
143
0
0
09 Aug 2025
On Explaining Visual Captioning with Hybrid Markov Logic Networks
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
VLM
191
0
0
28 Jul 2025
Efficiency Robustness of Dynamic Deep Learning Systems
Ravishka Rathnasuriya
Tingxi Li
Zexin Xu
Zihe Song
Mirazul Haque
Simin Chen
Wei Yang
AAML
SILM
386
1
0
12 Jun 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Computer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
266
9
0
03 Jun 2025
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Wei Hua
Chenlin Zhou
Jibin Wu
Yansong Chua
Yangyang Shu
410
2
0
19 May 2025
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Sang-Jun Park
Keun-Soo Heo
Dong-Hee Shin
Young-Han Son
Ji-Hye Oh
Tae-Eui Kam
MedIm
279
1
0
16 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
International Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
436
2
0
03 Apr 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
BigData Congress [Services Society] (BSS), 2024
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLM
BDL
VLM
335
1
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
463
12
0
18 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
321
0
0
11 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
410
0
0
03 Mar 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
320
1
0
09 Feb 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
European Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
315
3
0
03 Jan 2025
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
Abhas Kumar
Kapil Pathak
Rajesh Kavuru
Prabhakar Srinivasan
259
1
0
03 Dec 2024
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
398
6
0
20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
1.2K
1
0
19 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
International Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
314
12
0
09 Oct 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
235
3
0
28 Aug 2024
TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model
Yuhao Wang
Chao Hao
Yawen Cui
Xinqi Su
Weicheng Xie
Tao Tan
Zitong Yu
LM&MA
MedIm
202
1
0
22 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
479
0
0
09 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
European Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
240
12
0
29 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
323
4
0
16 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jinfa Huang
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
294
9
0
07 Jul 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
302
10
0
01 Jun 2024
Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model
Wenbing Li
Hang Zhou
Junqing Yu
Zikai Song
Wei Yang
Mamba
258
31
0
28 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
251
20
0
21 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
230
3
0
02 May 2024
Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach
Zhilin Zhang
Fangyu Wu
214
0
0
01 May 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
199
20
0
27 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
262
23
0
31 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
267
3
0
28 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
225
19
0
12 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
217
0
0
11 Mar 2024
Transformer based Multitask Learning for Image Captioning and Object Detection
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024
Debolena Basak
P. K. Srijith
M. Desarkar
210
3
0
10 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
331
50
0
06 Mar 2024
Social Media Ready Caption Generation for Brands
Himanshu Maheshwari
Koustava Goswami
Apoorv Saxena
Balaji Vasan Srinivasan
193
1
0
03 Jan 2024
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
273
15
0
23 Dec 2023
Improving Image Captioning via Predicting Structured Concepts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
242
19
0
14 Nov 2023
Complex Organ Mask Guided Radiology Report Generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tiancheng Gu
Dongnan Liu
Zhiyuan Li
Weidong Cai
MedIm
308
34
0
04 Nov 2023
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
medRxiv (medRxiv), 2023
Yingshu Li
Yunyi Liu
Zhanyu Wang
Xinyu Liang
Lei Wang
Lingqiao Liu
Leyang Cui
Zhaopeng Tu
Longyue Wang
Luping Zhou
ELM
LM&MA
352
0
0
31 Oct 2023
Semi-Supervised Panoptic Narrative Grounding
ACM Multimedia (ACM MM), 2023
Danni Yang
Jiayi Ji
Xiaoshuai Sun
Haowei Wang
Yinan Li
Yiwei Ma
Rongrong Ji
240
5
0
27 Oct 2023
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network
Ruizhi Wang
Xiang-Fei Wang
Jie Zhou
Thomas Lukasiewicz
Zhenghua Xu
215
1
0
09 Oct 2023
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
International Journal of Computer Vision (IJCV), 2023
Hao Zhang
Lumin Xu
Shenqi Lai
Wenqi Shao
Nanning Zheng
Ping Luo
Yu Qiao
Kaipeng Zhang
ObjD
VLM
337
15
0
08 Oct 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
International Conference on Language Resources and Evaluation (LREC), 2023
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
179
4
0
21 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
LM&MA
VLM
253
164
0
18 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
215
13
0
05 Sep 2023
1
2
3
4
5
Next
Page 1 of 5