Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,507 papers shown
Title
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
42
0
0
09 Feb 2025
Using Large Language Models for education managements in Vietnamese with low resources
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
38
0
0
28 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference
Duc Hau Nguyen
Duc Hau Nguyen
Pascale Sébillot
42
5
0
23 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
36
0
0
13 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
S. Chen
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
42
1
0
08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedIm
ViT
LM&MA
31
0
0
05 Jan 2025
Classifier-Guided Captioning Across Modalities
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
28
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
Gautier Evennou
Antoine Chaffin
Vivien Chappelier
Ewa Kijak
DiffM
68
0
0
20 Dec 2024
Automated Image Captioning with CNNs and Transformers
Joshua Adrian Cahyono
Jeremy Nathan Jusuf
VLM
ViT
75
0
0
13 Dec 2024
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation
Davor Vukadin
Petar Afrić
Marin Šilić
Goran Delač
FAtt
93
2
0
12 Dec 2024
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning
Amnon Bleich
A. Linnemann
B. Diem
Tim Conrad
MedIm
65
2
0
05 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
74
1
0
04 Dec 2024
Was that Sarcasm?: A Literature Survey on Sarcasm Detection
Harleen Kaur Bagga
Jasmine Bernard
Sahil Shaheen
Sarthak Arora
64
0
0
30 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
69
0
0
27 Nov 2024
GeoFormer: A Multi-Polygon Segmentation Transformer
Maxim Khomiakov
Michael Riis Andersen
J. Frellsen
68
0
0
25 Nov 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
Vaishnavi Khindkar
V. Balasubramanian
Chetan Arora
A. Subramanian
C. V. Jawahar
69
0
0
20 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
148
1
0
19 Nov 2024
Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts
Yijian Gao
D. C. Marshall
Xiaodan Xing
Junzhi Ning
G. Papanastasiou
G. Yang
M. Komorowski
MedIm
24
0
0
16 Nov 2024
SASE: A Searching Architecture for Squeeze and Excitation Operations
Hanming Wang
Yunlong Li
Zijun Wu
Huifen Wang
Yuan Zhang
3DPC
21
0
0
13 Nov 2024
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
31
0
0
11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)
Faisal Mehmood
Xin Guo
Enqing Chen
Muhammad Azeem Akbar
A. Khan
Sami Ullah
23
4
0
10 Nov 2024
Generalization and Risk Bounds for Recurrent Neural Networks
Xuewei Cheng
Ke Huang
Shujie Ma
24
1
0
05 Nov 2024
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
Jitesh Joshi
Sos S. Agaian
Youngjun Cho
AI4TS
39
1
0
03 Nov 2024
Semi-supervised Chinese Poem-to-Painting Generation via Cycle-consistent Adversarial Networks
Zhengyang Lu
Tianhao Guo
Feng Wang
GAN
24
1
0
25 Oct 2024
Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network
Suraj Kumar
S. Chattopadhyay
Chandranath Adak
26
0
0
23 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Ximing Dong
Shaowei Wang
Dayi Lin
Gopi Krishnan Rajbahadur
Boquan Zhou
Shichao Liu
Ahmed E. Hassan
AAML
LRM
23
1
0
16 Oct 2024
HASN: Hybrid Attention Separable Network for Efficient Image Super-resolution
Weifeng Cao
Xiaoyan Lei
Jun Shi
Wanyong Liang
Jie Liu
Zongfei Bai
SupR
24
0
0
13 Oct 2024
Continuous Risk Prediction
Yi Dai
15
1
0
12 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
30
0
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
28
3
0
09 Oct 2024
Demonstration Based Explainable AI for Learning from Demonstration Methods
Morris Gu
Elizabeth Croft
Dana Kulic
18
0
0
08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
34
1
0
06 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
Zheng Zhang
Xu Yuan
Lei Zhu
Jingkuan Song
Liqiang Nie
AAML
34
11
0
03 Oct 2024
Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample
Zhiwen Shao
Hancheng Zhu
Yong Zhou
Xiang Xiang
Bing-Quan Liu
Rui Yao
Lizhuang Ma
CML
19
2
0
02 Oct 2024
softmax is not enough (for sharp out-of-distribution)
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
37
17
0
01 Oct 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng
Faria Huq
Yue Jiang
Jason Wu
Amanda Li
Jeffrey P. Bigham
Amy Pavel
DiffM
25
4
0
30 Sep 2024
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning
Chengxin Zheng
Junzhong Ji
Yanzhao Shi
Xiaodan Zhang
Liangqiong Qu
3DV
MedIm
24
3
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
26
1
0
28 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
26
0
0
26 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
28
0
0
17 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
35
5
0
15 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
33
3
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
24
0
0
06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
109
7
0
02 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
29
1
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
Graph Attention Inference of Network Topology in Multi-Agent Systems
Akshay Kolli
Reza Azadeh
Kshitj Jerath
GNN
14
1
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
29
3
0
26 Aug 2024
Previous
1
2
3
4
5
...
69
70
71
Next