Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,580 papers shown
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
Xuelong Li
Hongjun An
Haofei Zhao
Guangying Li
Bo Liu
Xing Wang
Guanghua Cheng
Guojun Wu
Zhe Sun
393
3
0
14 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
307
20
0
12 Apr 2024
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images
Yizhi Pan
Junyi Xin
Tianhua Yang
Teeradaj Racharak
Le-Minh Nguyen
Guanqun Sun
117
19
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
168
18
0
12 Apr 2024
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
Zhenhao Tang
Xianghua Fu
Jian Chen
Shilong Hou
Jinze Lyu
134
6
0
09 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
348
15
0
06 Apr 2024
A Bi-consolidating Model for Joint Relational Triple Extraction
Xiaocheng Luo
Yanping Chen
Ruixue Tang
Caiwei Yang
Ruizhang Huang
Yongbin Qin
281
4
0
05 Apr 2024
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
European Conference on Computer Vision (ECCV), 2024
Adam Pardyl
Michal Wronka
Maciej Wolczyk
Kamil Adamczewski
Tomasz Trzciñski
Bartosz Zieliñski
407
3
0
04 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
227
19
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
302
14
0
28 Mar 2024
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang
Dingkang Yang
Zhaoyu Chen
Yang Liu
Siao Liu
Wenqiang Zhang
Lihua Zhang
Lizhe Qi
202
17
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
198
3
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
288
0
0
26 Mar 2024
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Computer Vision and Pattern Recognition (CVPR), 2024
Jimyeong Kim
Jungwon Park
Wonjong Rhee
DiffM
209
8
0
22 Mar 2024
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
Santosh Sanjeev
F. Maani
Arsen Abzhanov
Vijay Ram Papineni
Ibrahim Almakky
Bartlomiej W. Papie.z
Mohammad Yaqub
MedIm
195
1
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
317
15
0
20 Mar 2024
Training A Small Emotional Vision Language Model for Visual Art Comprehension
Jing Zhang
Liang Zheng
Meng Wang
Dan Guo
VLM
188
9
0
17 Mar 2024
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival
Yuanxin Zhao
Mi Zhang
Bingnan Yang
Zhan Zhang
Jiaju Kang
Jianya Gong
185
5
0
16 Mar 2024
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
European Conference on Computer Vision (ECCV), 2024
Yu-Chu Yu
Chi-Pin Huang
Jr-Jen Chen
Kai-Po Chang
Yung-Hsuan Lai
Fu-En Yang
Yu-Chiang Frank Wang
CLL
VLM
270
16
0
14 Mar 2024
Rethinking Referring Object Removal
Xiangtian Xue
Jiasong Wu
Youyong Kong
L. Senhadji
Huazhong Shu
DiffM
203
0
0
14 Mar 2024
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
IEEE International Conference on Multimedia and Expo (ICME), 2024
Dingbang Li
Wenzhou Chen
Xin Lin
LLMAG
LM&Ro
185
11
0
13 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
222
17
0
12 Mar 2024
A Survey of Explainable Knowledge Tracing
Yanhong Bai
Jiabao Zhao
Tingjiang Wei
Qing Cai
Xiaoling Wang
265
24
0
12 Mar 2024
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback
L. AdarshN
V. ArunP
L. AravindhN
125
6
0
11 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
202
0
0
11 Mar 2024
Transformer based Multitask Learning for Image Captioning and Object Detection
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024
Debolena Basak
P. K. Srijith
M. Desarkar
187
3
0
10 Mar 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVM
VGen
274
66
0
08 Mar 2024
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
238
1
0
08 Mar 2024
Towards Multimodal Human Intention Understanding Debiasing via Subject-Deconfounding
AAAI Conference on Artificial Intelligence (AAAI), 2024
Dingkang Yang
Dongling Xiao
Ke Li
Yuzheng Wang
Zhaoyu Chen
Jinjie Wei
Lihua Zhang
222
9
0
08 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
302
46
0
06 Mar 2024
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification
Robert Vacareanu
F. Alam
M. Islam
Haris Riaz
Mihai Surdeanu
NAI
203
6
0
05 Mar 2024
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment
Congzhi Zhang
Linhai Zhang
Jialong Wu
Deyu Zhou
Guoqiang Xu
CML
AI4CE
LRM
302
34
0
05 Mar 2024
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition
Yutian Liu
Wenjun Ke
Jianguo Wei
295
1
0
04 Mar 2024
DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable Causal Inference
Jialong Wu
Linhai Zhang
Deyu Zhou
Guoqiang Xu
CML
272
8
0
02 Mar 2024
ELA: Efficient Local Attention for Deep Convolutional Neural Networks
Wei Xu
Yi Wan
170
90
0
02 Mar 2024
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
Jiamin Luo
Jianing Zhao
Jingjing Wang
Guodong Zhou
234
0
0
29 Feb 2024
SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection
Yi Feng
Yu Ma
Qijun Chen
Ioannis Pitas
Rui Fan
295
15
0
29 Feb 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
212
47
0
28 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
219
6
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
759
40
0
28 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
331
8
0
25 Feb 2024
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
Li Mi
Syrielle Montariol
J. Castillo-Navarro
Xianjie Dai
Antoine Bosselut
D. Tuia
174
7
0
20 Feb 2024
Heterogeneity-aware Cross-school Electives Recommendation: a Hybrid Federated Approach
Chengyi Ju
Jiannong Cao
Yu Yang
Zhen-Qun Yang
Ho Man Lee
142
1
0
19 Feb 2024
AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization
Jiyao Li
Mingze Ni
Yifei Dong
Tianqing Zhu
Wei Liu
AAML
202
4
0
19 Feb 2024
Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
226
7
0
15 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
299
18
0
12 Feb 2024
Savvy: Trustworthy Autonomous Vehicles Architecture
Ali Shoker
Rehana Yasmin
Paulo Esteves-Verissimo
216
0
0
08 Feb 2024
Intensive Vision-guided Network for Radiology Report Generation
Physics in Medicine and Biology (PMB), 2023
Fudan Zheng
Mengfei Li
Ying Wang
Weijiang Yu
Ruixuan Wang
Zhiguang Chen
Nong Xiao
Yutong Lu
259
1
0
06 Feb 2024
Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets
Lei Xu
Moncef Gabbouj
GAN
197
2
0
03 Feb 2024
Image Fusion via Vision-Language Model
Zixiang Zhao
Lilun Deng
Haowen Bai
Yukun Cui
Zhipeng Zhang
...
Haotong Qin
Dongdong Chen
Jiangshe Zhang
Peng Wang
Luc Van Gool
VLM
286
56
0
03 Feb 2024
Previous
1
2
3
...
5
6
7
...
70
71
72
Next