Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,580 papers shown
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
VLM
177
1
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
300
1
0
02 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGe
VLM
222
10
0
01 Jun 2023
Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism
Image and Vision Computing (IVC), 2023
Haoxuan Xu
Songning Lai
Xianyang Li
Y. Yang
ViT
259
17
0
31 May 2023
HGT: A Hierarchical GCN-Based Transformer for Multimodal Periprosthetic Joint Infection Diagnosis Using CT Images and Text
Ruiyang Li
Fujun Yang
Xianjie Liu
Hon-Yi Shi
189
1
0
29 May 2023
GBG++: A Fast and Stable Granular Ball Generation Method for Classification
IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023
Qin Xie
Qinghua Zhang
Shuyin Xia
Fan Zhao
Chengying Wu
Guoyin Wang
Weiping Ding
314
32
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
391
53
0
28 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Asian Conference on Computer Vision (ACCV), 2023
Qi Chen
Yutong Xie
Biao Wu
Minh-Son To
James Ang
Qi Wu
152
2
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Chia-Wen Kuo
Z. Kira
197
39
0
25 May 2023
TOAST: Transfer Learning via Attention Steering
Baifeng Shi
Siyu Gai
Trevor Darrell
Xin Wang
146
19
0
24 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
228
1
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
205
11
0
20 May 2023
Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture
Neural Information Processing Systems (NeurIPS), 2023
Galen Pogoncheff
Jacob Granley
M. Beyeler
AAML
FAtt
151
12
0
18 May 2023
Emergent Communication with Attention
Annual Meeting of the Cognitive Science Society (CogSci), 2023
Ryokan Ri
Ryo Ueda
Jason Naradowsky
160
2
0
18 May 2023
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Aanisha Bhattacharya
Yaman Kumar Singla
Balaji Krishnamurthy
R. Shah
Changyou Chen
VGen
316
15
0
16 May 2023
PLIP: Language-Image Pre-training for Person Representation Learning
Neural Information Processing Systems (NeurIPS), 2023
Jia-li Zuo
Jiahao Hong
Feng Zhang
Changqian Yu
Hanyu Zhou
Changxin Gao
Nong Sang
Jingdong Wang
VLM
MLLM
404
63
0
15 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
ACM Multimedia (ACM MM), 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
240
5
0
13 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
European Conference on Artificial Intelligence (ECAI), 2023
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
147
13
0
11 May 2023
Learning the Visualness of Text Using Large Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gaurav Verma
Ryan Rossi
Chris Tensmeyer
Jiuxiang Gu
A. Nenkova
VLM
176
0
0
11 May 2023
Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification
Xulin Li
Yan Lu
B. Liu
Yuenan Hou
Yating Liu
Qi Chu
Wanli Ouyang
Nenghai Yu
OOD
CML
196
9
0
10 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
352
159
0
09 May 2023
Image Captioners Sometimes Tell More Than Images They See
Honori Udo
Takafumi Koshinaka
VLM
134
4
0
04 May 2023
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
Computer Vision and Pattern Recognition (CVPR), 2023
Shun-cheng Wu
Keisuke Tateno
Nassir Navab
F. Tombari
3DPC
3DV
279
30
0
04 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
474
124
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
357
25
0
03 May 2023
Fairness in AI Systems: Mitigating gender bias from language-vision models
Lavisha Aggarwal
Shruti Bhargava
130
5
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
207
16
0
03 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Xuehai He
Xin Eric Wang
317
10
0
30 Apr 2023
Multi-Modality Deep Network for Extreme Learned Image Compression
AAAI Conference on Artificial Intelligence (AAAI), 2023
Xuhao Jiang
Weimin Tan
Tian Tan
Bo Yan
Liquan Shen
99
22
0
26 Apr 2023
A Review of Deep Learning for Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
225
38
0
22 Apr 2023
Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Isabell Lederer
Rudolf Mayer
Andreas Rauber
245
30
0
22 Apr 2023
Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search
Andrei Kucharavy
M. Monti
R. Guerraoui
Ljiljana Dolamic
161
1
0
20 Apr 2023
TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Quanjiang Guo
Zhao Kang
Ling Tian
Zhouguo Chen
156
12
0
19 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
241
178
0
17 Apr 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Zehua Wang
Guanbin Li
Jingzhou Luo
Guanbin Li
BDL
LRM
288
6
0
17 Apr 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
219
71
0
12 Apr 2023
Learning Transferable Pedestrian Representation from Multimodal Information Supervision
Li-Na Bao
Longhui Wei
Xiaoyu Qiu
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
213
6
0
12 Apr 2023
ImageCaptioner
2
^2
2
: Image Captioner for Image Captioning Bias Amplification Assessment
AAAI Conference on Artificial Intelligence (AAAI), 2023
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
200
10
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
301
103
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
339
23
0
07 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
Marcel Worring
OOD
AAML
253
6
0
06 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Computer Vision and Pattern Recognition (CVPR), 2023
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
195
142
0
05 Apr 2023
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Osman Tursun
Akila Pemasiri
Sridha Sridharan
Clinton Fookes
ViT
VLM
165
7
0
05 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Computer Vision and Pattern Recognition (CVPR), 2023
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
248
25
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
IEEE Transactions on Image Processing (IEEE TIP), 2023
Shizhen Chang
Pedram Ghamisi
183
81
0
03 Apr 2023
SARGAN: Spatial Attention-based Residuals for Facial Expression Manipulation
Arbish Akram
Nazar Khan
GAN
CVBM
209
12
0
30 Mar 2023
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
Zhengqing Miao
Xin Zhang
Mei-rong Zhao
Dong Ming
126
9
0
29 Mar 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
177
0
0
28 Mar 2023
Medical Image Analysis using Deep Relational Learning
Zhi-Hu Liu
MedIm
158
0
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
IEEE International Conference on Computer Vision (ICCV), 2023
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
511
43
0
28 Mar 2023
Previous
1
2
3
...
9
10
11
...
70
71
72
Next