Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,578 papers shown
Title
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
212
2
0
10 Mar 2025
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
Rahul Nair
Bhanu Tokas
Neel Shah
335
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
234
5
0
09 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition
Si Zhou
Yain-Whar Si
Xiaochen Yuan
Xiaofan Li
Xiaoxiang Liu
Xinyuan Zhang
Cong Lin
Xueyuan Gong
CVBM
272
0
0
08 Mar 2025
Extracting Symbolic Sequences from Visual Representations via Self-Supervised Learning
Victor Sebastian Martinez Pozos
Ivan Vladimir Meza Ruiz
156
0
0
06 Mar 2025
Cross-modal Causal Relation Alignment for Video Question Grounding
Computer Vision and Pattern Recognition (CVPR), 2025
Weixing Chen
Wenshu Fan
Binglin Chen
Jiandong Su
Yongsen Zheng
Guanbin Li
BDL
VGen
CML
262
7
0
05 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Medical Image Analysis (MedIA), 2025
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
205
0
0
03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
335
0
0
03 Mar 2025
A Survey of Link Prediction in Temporal Networks
Jiafeng Xiong
Ahmad Zareie
Rizos Sakellariou
AI4TS
AI4CE
181
6
0
28 Feb 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Hemanth Teja Yanambakkam
Rahul Chinthala
91
0
0
26 Feb 2025
Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos
The Web Conference (WWW), 2025
Jiamin Luo
Jingjing Wang
Junxiao Ma
Yujie Jin
Shoushan Li
Guodong Zhou
244
1
0
26 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
247
6
0
26 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Swadhin Das
Saarthak Gupta
and Kamal Kumar
Raksha Sharma
135
2
0
22 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Nianzu Yang
Yinglu Li
Zuan Gao
Yun Zheng
Hongtao Xie
VLM
CoGe
430
0
0
19 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
421
7
0
19 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
250
1
0
09 Feb 2025
Using Large Language Models for education managements in Vietnamese with low resources
Pacific Asia Conference on Language, Information and Computation (PACLIC), 2025
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
244
2
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
International Conference on Machine Learning and Applications (ICMLA), 2021
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
192
6
0
23 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
379
2
0
13 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
AAAI Conference on Artificial Intelligence (AAAI), 2025
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
238
5
0
08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedIm
ViT
LM&MA
168
3
0
05 Jan 2025
Classifier-Guided Captioning Across Modalities
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
197
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
European Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
241
1
0
03 Jan 2025
Real-time Bangla Sign Language Translator
Rotan Hawlader Pranto
Shahnewaz Siddique
SLR
140
2
0
21 Dec 2024
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Gautier Evennou
Antoine Chaffin
Vivien Chappelier
Ewa Kijak
DiffM
236
1
0
20 Dec 2024
Automated Image Captioning with CNNs and Transformers
Joshua Adrian Cahyono
Jeremy Nathan Jusuf
VLM
ViT
119
1
0
13 Dec 2024
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024
Davor Vukadin
Petar Afrić
Marin Šilić
Goran Delač
FAtt
231
2
0
12 Dec 2024
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Computer Vision and Pattern Recognition (CVPR), 2024
Bo Tong
Bokai Lai
Weihao Ye
Gen Luo
Chunjiang Ge
Ke Li
Xiaoshuai Sun
Rongrong Ji
VLM
MLLM
185
4
0
05 Dec 2024
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning
Amnon Bleich
A. Linnemann
B. Diem
Tim Conrad
MedIm
191
4
0
05 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
257
4
0
04 Dec 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection
Harleen Kaur Bagga
Jasmine Bernard
Sahil Shaheen
Sarthak Arora
157
1
0
30 Nov 2024
Detailed Object Description with Controllable Dimensions
IEEE transactions on multimedia (IEEE TMM), 2024
Xinran Wang
Hao Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Tianhao Shen
Jun Guo
281
1
0
28 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
308
1
0
27 Nov 2024
GeoFormer: A Multi-Polygon Segmentation Transformer
British Machine Vision Conference (BMVC), 2024
Maxim Khomiakov
Michael Riis Andersen
J. Frellsen
183
1
0
25 Nov 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Vaishnavi Khindkar
V. Balasubramanian
Chetan Arora
A. Subramanian
C. V. Jawahar
251
0
0
20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
1.0K
1
0
19 Nov 2024
Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts
Yijian Gao
D. C. Marshall
Xiaodan Xing
Junzhi Ning
G. Papanastasiou
G. Yang
M. Komorowski
MedIm
156
0
0
16 Nov 2024
SASE: A Searching Architecture for Squeeze and Excitation Operations
Hanming Wang
Yunlong Li
Zijun Wu
Huifen Wang
Yuan Zhang
3DPC
117
1
0
13 Nov 2024
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
223
1
0
11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)
Computers in Human Behavior (CHB), 2024
Faisal Mehmood
Xin Guo
Enqing Chen
Muhammad Azeem Akbar
A. Khan
Sami Ullah
246
8
0
10 Nov 2024
Generalization and Risk Bounds for Recurrent Neural Networks
Xuewei Cheng
Ke Huang
Shujie Ma
295
1
0
05 Nov 2024
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
Neural Information Processing Systems (NeurIPS), 2024
Jitesh Joshi
Sos S. Agaian
Youngjun Cho
AI4TS
270
6
0
03 Nov 2024
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
Zhengyang Lu
Tianhao Guo
Feng Wang
GAN
145
7
0
25 Oct 2024
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network
Suraj Kumar
S. Chattopadhyay
Chandranath Adak
139
0
0
23 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Ximing Dong
Shaowei Wang
Dayi Lin
Gopi Krishnan Rajbahadur
Boquan Zhou
Shichao Liu
Ahmed E. Hassan
AAML
LRM
335
4
0
16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution
The Visual Computer (VC), 2024
Weifeng Cao
Xiaoyan Lei
Jun Shi
Wanyong Liang
Jie Liu
Zongfei Bai
SupR
235
4
0
13 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
147
0
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
International Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
227
9
0
09 Oct 2024
Demonstration Based Explainable AI for Learning from Demonstration Methods
IEEE Robotics and Automation Letters (RA-L), 2024
Morris Gu
Elizabeth Croft
Dana Kulic
147
1
0
08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Asian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
138
5
0
06 Oct 2024
Previous
1
2
3
4
5
6
...
70
71
72
Next