Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,507 papers shown
Title
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models
Purushothaman Natarajan
Athira Nambiar
AAML
17
3
0
23 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
M. Zhang
Xunliang Cai
LRM
29
6
0
21 Aug 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
Mingrui Wu
Oucheng Huang
Jiayi Ji
Jiale Li
Xinyue Cai
Huafeng Kuang
Jianzhuang Liu
Xiaoshuai Sun
Rongrong Ji
40
3
0
19 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
40
1
0
16 Aug 2024
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
Arpan Mahara
N. Rishe
Liangdong Deng
VLM
GAN
32
2
0
15 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
33
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
35
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
27
0
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
43
2
0
07 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
19
2
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran
Frederick W. B. Li
Jingjing Deng
Hubert P. H. Shum
48
0
0
03 Aug 2024
Review of Cloud Service Composition for Intelligent Manufacturing
Cuixia Li
Liqiang Liu
Li Shi
19
0
0
03 Aug 2024
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Yaming Yang
Zhe Wang
Fuhai Chen
Wei Zhao
Weigang Lu
Joemon M. Jose
CVBM
23
1
0
01 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
19
0
0
01 Aug 2024
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction
Yanxu Mao
Peipei Liu
Tiehan Cui
24
0
0
31 Jul 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
41
0
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
33
6
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
44
3
0
26 Jul 2024
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
40
1
0
22 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
40
5
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
28
0
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
31
8
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
27
0
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
24
2
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
33
2
0
15 Jul 2024
Predicting Winning Captions for Weekly New Yorker Comics
Stanley Cao
Sonny Young
ViT
VLM
27
1
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
42
0
0
10 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jiayi Ji
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
44
3
0
07 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
22
0
0
06 Jul 2024
Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training
Dingkang Yang
Kun Yang
Haopeng Kuang
Zhaoyu Chen
Yuzheng Wang
Lihua Zhang
CML
36
4
0
06 Jul 2024
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention
Rishi Mohan
Sanjay Sureshkumar
Vignesh Sivasubramaniam
25
1
0
28 Jun 2024
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
37
6
0
28 Jun 2024
Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration
M. Khaniki
Alireza Golkarieh
Mohammad Manthouri
MedIm
24
4
0
25 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
42
0
0
24 Jun 2024
Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification
Honori Udo
Takafumi Koshinaka
VLM
32
0
0
22 Jun 2024
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
19
1
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
37
3
0
19 Jun 2024
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning
Xiaowen Ma
Jiawei Yang
Rui Che
Huanting Zhang
Wei Zhang
16
4
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
30
0
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
37
4
0
15 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao-Yu Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
34
8
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
22
6
0
09 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
23
3
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
36
3
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
39
10
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
18
3
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
27
0
0
01 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
35
28
0
31 May 2024
Previous
1
2
3
4
5
6
...
69
70
71
Next