Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,580 papers shown
Demonstration Based Explainable AI for Learning from Demonstration Methods
IEEE Robotics and Automation Letters (RA-L), 2024
Morris Gu
Elizabeth Croft
Dana Kulic
175
2
0
08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Asian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
178
5
0
06 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
IEEE Transactions on Image Processing (TIP), 2024
Zheng Zhang
Xu Yuan
Lei Zhu
Jingkuan Song
Liqiang Nie
AAML
231
20
0
03 Oct 2024
Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample
International Journal of Computer Vision (IJCV), 2024
Zhiwen Shao
Hancheng Zhu
Yong Zhou
Xiang Xiang
Bing-Quan Liu
Rui Yao
Lizhuang Ma
CML
150
11
0
02 Oct 2024
Softmax is not Enough (for Sharp Size Generalisation)
Petar Velickovic
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
405
19
0
01 Oct 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
European Conference on Computer Vision (ECCV), 2024
Yi-Hao Peng
Faria Huq
Yue Jiang
Jason Wu
Amanda Li
Jeffrey P. Bigham
Amy Pavel
DiffM
246
9
0
30 Sep 2024
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chengxin Zheng
Junzhong Ji
Yanzhao Shi
Xiaodan Zhang
Liangqiong Qu
3DV
MedIm
265
4
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Asian Conference on Computer Vision (ACCV), 2024
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
274
7
0
28 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
217
6
0
26 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
418
5
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Hanane Azzag
Hanane Azzag
M. Lebbah
ObjD
349
2
0
17 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
243
19
0
15 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
172
7
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
336
0
0
06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
International Conference on Learning Representations (ICLR), 2024
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
445
32
0
02 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
ACM Multimedia (MM), 2024
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
218
4
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
222
2
0
28 Aug 2024
Graph Attention Inference of Network Topology in Multi-Agent Systems
IFAC-PapersOnLine (IFAC-PapersOnLine), 2024
Akshay Kolli
Reza Azadeh
Kshitj Jerath
GNN
146
1
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
British Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
291
7
0
26 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
426
11
0
23 Aug 2024
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models
Purushothaman Natarajan
Athira Nambiar
AAML
133
4
0
23 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
Hao Fei
Xunliang Cai
LRM
249
11
0
21 Aug 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
Mingrui Wu
Oucheng Huang
Jiayi Ji
Jiale Li
Xinyue Cai
Huafeng Kuang
Jianzhuang Liu
Xiaoshuai Sun
Rongrong Ji
207
4
0
19 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Neural Information Processing Systems (NeurIPS), 2024
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
199
5
0
16 Aug 2024
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
Conference on Algebraic Informatics (CAI), 2024
Arpan Mahara
N. Rishe
Liangdong Deng
VLM
GAN
192
8
0
15 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
245
3
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
European Conference on Computer Vision (ECCV), 2024
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
209
5
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
432
0
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
European Conference on Computer Vision (ECCV), 2024
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
341
2
0
07 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
European Conference on Computer Vision (ECCV), 2024
Xianyu Chen
Ming Jiang
Qi Zhao
211
8
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
318
2
0
04 Aug 2024
ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran
Frederick W. B. Li
Jingjing Deng
Hubert P. H. Shum
266
0
0
03 Aug 2024
Review of Cloud Service Composition for Intelligent Manufacturing
Cuixia Li
Liqiang Liu
Li Shi
128
1
0
03 Aug 2024
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
ACM Multimedia (MM), 2024
Yaming Yang
Zhe Wang
Fuhai Chen
Ziyu Guan
Weigang Lu
Joemon M. Jose
CVBM
269
10
0
01 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
200
0
0
01 Aug 2024
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction
Yanxu Mao
Peipei Liu
Tiehan Cui
207
2
0
31 Jul 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
236
3
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
European Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
183
12
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
213
9
0
26 Jul 2024
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
282
3
0
22 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
258
19
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
155
2
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
228
18
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
231
2
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
279
17
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
199
8
0
15 Jul 2024
Predicting Winning Captions for Weekly New Yorker Comics
Stanley Cao
Sonny Young
ViT
VLM
139
1
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Elisa Kreiss
403
2
0
10 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jinfa Huang
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
247
9
0
07 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
213
1
0
06 Jul 2024
Previous
1
2
3
4
5
...
70
71
72
Next