Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,580 papers shown
Semi-Supervised Panoptic Narrative Grounding
ACM Multimedia (ACM MM), 2023
Danni Yang
Jiayi Ji
Xiaoshuai Sun
Haowei Wang
Yinan Li
Yiwei Ma
Rongrong Ji
212
5
0
27 Oct 2023
Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Benjamin Yan
Ruochen Liu
David E. Kuo
Subathra Adithan
Eduardo Pontes Reis
...
V. Venugopal
Chloe P. O'Connell
Agustina Saenz
Pranav Rajpurkar
Michael Moor
MedIm
263
37
0
26 Oct 2023
Cross-modal Active Complementary Learning with Self-refining Correspondence
Neural Information Processing Systems (NeurIPS), 2023
Yang Qin
Yuan Sun
Dezhong Peng
Qiufeng Wang
Xiaocui Peng
Peng Hu
283
32
0
26 Oct 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
451
2
0
24 Oct 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Lei Li
246
35
0
24 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
352
20
0
19 Oct 2023
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
320
134
0
18 Oct 2023
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
International Conference on Learning Representations (ICLR), 2023
Rujie Wu
Xiaojian Ma
Zhenliang Zhang
Wei Wang
Qing Li
Song-Chun Zhu
Yizhou Wang
LRM
VLM
339
16
0
16 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
322
9
0
16 Oct 2023
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
229
1
0
12 Oct 2023
CLIP for Lightweight Semantic Segmentation
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Ke Jin
Wankou Yang
VLM
171
2
0
11 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation
International Conference Robotics and Computer Vision (ICRCV), 2023
Rashid Khan
Bingding Huang
Haseeb Hassan
Asim Zaman
Z. Ye
156
3
0
11 Oct 2023
A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection
Yang Wang
Jiaogen Zhou
Jihong Guan
304
12
0
09 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
IEEE International Conference on Robotics and Automation (ICRA), 2023
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
380
289
0
03 Oct 2023
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
168
4
0
03 Oct 2023
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
131
2
0
02 Oct 2023
YOLOR-Based Multi-Task Learning
Hung-Shuo Chang
Chien-Yao Wang
Hang Yan
Yukun Zhu
Hongpeng Liao
MoE
VLM
193
22
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
208
22
0
28 Sep 2023
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
IEEE International Conference on Computer Vision (ICCV), 2023
Tohida Rehman
Ronit Mandal
Jimuyang Zhang
Debarshi Kumar Sanyal
SSL
359
25
0
28 Sep 2023
Social Media Fashion Knowledge Extraction as Captioning
Yifei Yuan
Wenxuan Zhang
Yang Deng
Wai Lam
176
2
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRM
RALM
310
81
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Avamarie Brueggeman
Andrea Madotto
Mohammad Kachuee
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
286
110
0
27 Sep 2023
CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading
Hao Wei
Peilun Shi
Juzheng Miao
Minqing Zhang
Guitao Bai
Jianing Qiu
Furui Liu
Wu Yuan
MedIm
OOD
163
7
0
27 Sep 2023
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque
Iffat Labiba
Sadia Akter
3DH
CVBM
201
2
0
24 Sep 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
316
22
0
23 Sep 2023
An Empirical Study of Attention Networks for Semantic Segmentation
Hao Guo
Hongbiao Si
Guilin Jiang
Wei Zhang
Zhiyan Liu
Xuanyi Zhu
Xulong Zhang
Yang Liu
216
1
0
19 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
LM&MA
VLM
220
139
0
18 Sep 2023
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
Ching-Hsun Tseng
Shin-Jye Lee
Po-Wei Cheng
Chien Lee
Chih-Chieh Hung
112
0
0
18 Sep 2023
Holistic Geometric Feature Learning for Structured Reconstruction
IEEE International Conference on Computer Vision (ICCV), 2023
Ziqiong Lu
Linxi Huan
Qiyuan Ma
Xianwei Zheng
183
2
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
144
3
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
185
8
0
15 Sep 2023
PatFig: Generating Short and Long Captions for Patent Figures
Dana Aubakirova
Kim Gerdes
Lufei Liu
201
15
0
15 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
294
30
0
14 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
254
73
0
12 Sep 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
238
32
0
11 Sep 2023
C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
William Theisen
Walter J. Scheirer
CLIP
VLM
197
2
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
317
2
0
06 Sep 2023
Exchanging-based Multimodal Fusion with Transformer
Renyu Zhu
Chengcheng Han
Yong Qian
Qiushi Sun
Xiang Li
Ming Gao
Xuezhi Cao
Yunsen Xian
174
5
0
05 Sep 2023
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
194
0
0
31 Aug 2023
FIRE: Food Image to REcipe generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
P. Chhikara
Dhiraj Chaurasia
Yifan Jiang
Omkar Masur
Filip Ilievski
265
35
0
28 Aug 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks
Findings (Findings), 2023
Jennifer Hsia
Danish Pruthi
Aarti Singh
Zachary Chase Lipton
215
7
0
28 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
212
20
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Haibo Jin
Haoxuan Che
Yi Lin
Haoxing Chen
MedIm
283
123
0
24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
180
30
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
ACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
217
24
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Future generations computer systems (FGCS), 2023
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
183
10
0
22 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
IEEE International Conference on Computer Vision (ICCV), 2023
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
189
3
0
21 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
IEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
208
27
0
18 Aug 2023
Learning the meanings of function words from grounded language using a visual question answering model
Cognitive Sciences (CogSci), 2023
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
271
7
0
16 Aug 2023
Visually-Aware Context Modeling for News Image Captioning
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
VLM
130
17
0
16 Aug 2023
Previous
1
2
3
...
7
8
9
...
70
71
72
Next