Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,507 papers shown
Title
Empowering Vision Transformers with Multi-Scale Causal Intervention for Long-Tailed Image Classification
Xiaoshuo Yan
Z. Li
Lei Meng
Zhuang Qi
Wei Wu
Zixuan Li
X. Meng
CML
BDL
38
0
0
13 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
22
0
0
13 May 2025
Anatomical Attention Alignment representation for Radiology Report Generation
Quang Vinh Nguyen
Minh Duc Nguyen
Thanh Hoang Son Vo
Hyung-Jeong Yang
Soo-Hyung Kim
MedIm
23
0
0
12 May 2025
Explainable AI the Latest Advancements and New Trends
Bowen Long
Enjie Liu
Renxi Qiu
Yanqing Duan
XAI
31
0
0
11 May 2025
Achieving 3D Attention via Triplet Squeeze and Excitation Block
Maan Alhazmi
Abdulrahman Altahhan
21
0
0
09 May 2025
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
Zibo Xu
Qiang Li
Weizhi Nie
Weijie Wang
Anan Liu
CML
MedIm
42
0
0
05 May 2025
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLM
VLM
54
0
0
05 May 2025
Segment Any RGB-Thermal Model with Language-aided Distillation
Dong Xing
Xianxun Zhu
Wei Zhou
Qika Lin
Hang Yang
Yuqing Wang
VLM
56
0
0
04 May 2025
Positional Attention for Efficient BERT-Based Named Entity Recognition
Mo Sun
Siheng Xiong
Yuankai Cai
Bowen Zuo
12
0
0
03 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
51
0
0
01 May 2025
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation
Amaan Izhar
Nurul Japar
Norisma Idris
Ting Dang
MoE
64
0
0
29 Apr 2025
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Kai Wang
Kohou Wang
Shiguo Lian
50
0
0
25 Apr 2025
CAMU: Context Augmentation for Meme Understanding
Girish A. Koushik
Diptesh Kanojia
Helen Treharne
Aditya Joshi
VLM
96
0
0
24 Apr 2025
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning
Yassir Benhammou
Alessandro Tiberio
Gabriel Trautmann
Suman Kalyan
MLLM
VLM
36
0
0
21 Apr 2025
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park
Keun-Soo Heo
Dong-Hee Shin
Young-Han Son
Ji-Hye Oh
Tae-Eui Kam
MedIm
34
0
0
16 Apr 2025
Pay Attention to What and Where? Interpretable Feature Extractor in Vision-based Deep Reinforcement Learning
Tien Pham
Angelo Cangelosi
24
0
0
14 Apr 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
52
0
0
14 Apr 2025
AeroLite: Tag-Guided Lightweight Generation of Aerial Image Captions
Xing Zi
Tengjun Ni
Xianjing Fan
Xian Tao
Jun Li
Ali Braytee
Mukesh Prasad
23
0
0
13 Apr 2025
Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention
Moyang Liu
Kaiying Yan
Yukun Liu
Ruibo Fu
Zhengqi Wen
Xuefei Liu
Chenxing Li
31
0
0
12 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
38
0
0
03 Apr 2025
The Dual-Route Model of Induction
Sheridan Feucht
Eric Todd
Byron C. Wallace
David Bau
26
0
0
03 Apr 2025
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
36
0
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
34
0
0
02 Apr 2025
PolygoNet: Leveraging Simplified Polygonal Representation for Effective Image Classification
Salim Khazem
Jérémy Fix
C´edric Pradalier
36
0
0
01 Apr 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
84
2
0
27 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLM
CoGe
72
0
0
25 Mar 2025
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module
Yishen Liu
Shengda Liu
Hudan Pan
MedIm
50
0
0
24 Mar 2025
MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation
Xiaodan Zhang
Yanzhao Shi
Junzhong Ji
Chengxin Zheng
Liangqiong Qu
35
0
0
22 Mar 2025
Casual Inference via Style Bias Deconfounding for Domain Generalization
Jiaxi Li
Di Lin
Hao Chen
Hongying Liu
Liang Wan
Wei Feng
OOD
CML
AI4CE
50
0
0
21 Mar 2025
DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis
Chen Gong
Kecen Li
Zinan Lin
Tianhao Wang
47
3
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
41
0
0
18 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
50
1
0
16 Mar 2025
Extreme Learning Machines for Attention-based Multiple Instance Learning in Whole-Slide Image Classification
Rajiv Krishnakumar
Julien Baglio
Frederik F. Flöther
Christian Ruiz
Stefan Habringer
Nicole H. Romano
MedIm
39
0
0
13 Mar 2025
Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection
Zihao Zhang
Aming Wu
Yahong Han
ObjD
48
0
0
13 Mar 2025
Convolutional Rectangular Attention Module
Hai-Vy Nguyen
Fabrice Gamboa
Sixin Zhang
Reda Chhaibi
Serge Gratton
Thierry Giaccone
45
0
0
13 Mar 2025
Measuring directional bias amplification in image captions using predictability
Rahul Nair
Bhanu Tokas
Neel Shah
Hannah Kerner
46
0
0
10 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
62
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
45
0
0
09 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition
Si Zhou
Yain-Whar Si
Xiaochen Yuan
Xiaofan Li
Xiaoxiang Liu
Xinyuan Zhang
Cong Lin
Xueyuan Gong
CVBM
73
0
0
08 Mar 2025
Extracting Symbolic Sequences from Visual Representations via Self-Supervised Learning
Victor Sebastian Martinez Pozos
Ivan Vladimir Meza Ruiz
39
0
0
06 Mar 2025
Cross-modal Causal Relation Alignment for Video Question Grounding
Weixing Chen
Y. Liu
Binglin Chen
Jiandong Su
Yongsen Zheng
Liang Lin
BDL
VGen
CML
41
2
0
05 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
61
0
0
03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
74
0
0
03 Mar 2025
A Survey of Link Prediction in Temporal Networks
Jiafeng Xiong
Ahmad Zareie
Rizos Sakellariou
AI4TS
AI4CE
34
1
0
28 Feb 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Hemanth Teja Yanambakkam
Rahul Chinthala
28
0
0
26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos
Jiamin Luo
Jingjing Wang
Junxiao Ma
Yujie Jin
Shoushan Li
Guodong Zhou
31
0
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
66
0
0
26 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Swadhin Das
Saarthak Gupta
and Kamal Kumar
Raksha Sharma
33
0
0
22 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
44
1
0
19 Feb 2025
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Pandeng Li
Yun Zheng
Hongtao Xie
Yun Zheng
Hongtao Xie
VLM
CoGe
96
0
0
19 Feb 2025
1
2
3
4
...
69
70
71
Next