ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown
Demonstration Based Explainable AI for Learning from Demonstration
  Methods
Demonstration Based Explainable AI for Learning from Demonstration MethodsIEEE Robotics and Automation Letters (RA-L), 2024
Morris Gu
Elizabeth Croft
Dana Kulic
175
2
0
08 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
178
5
0
06 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
BadCM: Invisible Backdoor Attack Against Cross-Modal LearningIEEE Transactions on Image Processing (TIP), 2024
Zheng Zhang
Xu Yuan
Lei Zhu
Jingkuan Song
Liqiang Nie
AAML
231
20
0
03 Oct 2024
Facial Action Unit Detection by Adaptively Constraining Self-Attention
  and Causally Deconfounding Sample
Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding SampleInternational Journal of Computer Vision (IJCV), 2024
Zhiwen Shao
Hancheng Zhu
Yong Zhou
Xiang Xiang
Bing-Quan Liu
Rui Yao
Lizhuang Ma
CML
150
11
0
02 Oct 2024
Softmax is not Enough (for Sharp Size Generalisation)
Softmax is not Enough (for Sharp Size Generalisation)
Petar Velickovic
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
405
19
0
01 Oct 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data
  Generation
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data GenerationEuropean Conference on Computer Vision (ECCV), 2024
Yi-Hao Peng
Faria Huq
Yue Jiang
Jason Wu
Amanda Li
Jeffrey P. Bigham
Amy Pavel
DiffM
246
9
0
30 Sep 2024
See Detail Say Clear: Towards Brain CT Report Generation via
  Pathological Clue-driven Representation Learning
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chengxin Zheng
Junzhong Ji
Yanzhao Shi
Xiaodan Zhang
Liangqiong Qu
3DVMedIm
265
4
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image CaptioningAsian Conference on Computer Vision (ACCV), 2024
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
274
7
0
28 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for
  Zero-shot Captioning
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIPVLM
217
6
0
26 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
418
5
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of
  Modalities
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Hanane Azzag
Hanane Azzag
M. Lebbah
ObjD
349
2
0
17 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
243
19
0
15 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using
  Large Language Models
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language ModelsInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
172
7
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive
  Differentiation of Normal and Abnormal Attributes
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal AttributesIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
336
0
0
06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video RetrievalInternational Conference on Learning Representations (ICLR), 2024
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
445
32
0
02 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image CaptioningACM Multimedia (MM), 2024
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
218
4
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
222
2
0
28 Aug 2024
Graph Attention Inference of Network Topology in Multi-Agent Systems
Graph Attention Inference of Network Topology in Multi-Agent SystemsIFAC-PapersOnLine (IFAC-PapersOnLine), 2024
Akshay Kolli
Reza Azadeh
Kshitj Jerath
GNN
146
1
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based OptimizationBritish Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
291
7
0
26 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CEVLM
426
11
0
23 Aug 2024
VALE: A Multimodal Visual and Language Explanation Framework for Image
  Classifiers using eXplainable AI and Language Models
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models
Purushothaman Natarajan
Athira Nambiar
AAML
133
4
0
23 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
Hao Fei
Xunliang Cai
LRM
249
11
0
21 Aug 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
TraDiffusion: Trajectory-Based Training-Free Image Generation
Mingrui Wu
Oucheng Huang
Jiayi Ji
Jiale Li
Xinyue Cai
Huafeng Kuang
Jianzhuang Liu
Xiaoshuai Sun
Rongrong Ji
207
4
0
19 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted
  Attack for Image-to-Text Models
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text ModelsNeural Information Processing Systems (NeurIPS), 2024
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
199
5
0
16 Aug 2024
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating
  Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I TranslationConference on Algebraic Informatics (CAI), 2024
Arpan Mahara
N. Rishe
Liangdong Deng
VLMGAN
192
8
0
15 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
245
3
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense CaptioningEuropean Conference on Computer Vision (ECCV), 2024
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
209
5
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
432
0
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language ModelingEuropean Conference on Computer Vision (ECCV), 2024
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
341
2
0
07 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual ScanpathsEuropean Conference on Computer Vision (ECCV), 2024
Xianyu Chen
Ming Jiang
Qi Zhao
211
8
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity AssistanceIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
318
2
0
04 Aug 2024
ST-SACLF: Style Transfer Informed Self-Attention Classifier for
  Bias-Aware Painting Classification
ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran
Frederick W. B. Li
Jingjing Deng
Hubert P. H. Shum
266
0
0
03 Aug 2024
Review of Cloud Service Composition for Intelligent Manufacturing
Review of Cloud Service Composition for Intelligent Manufacturing
Cuixia Li
Liqiang Liu
Li Shi
128
1
0
03 Aug 2024
Towards End-to-End Explainable Facial Action Unit Recognition via
  Vision-Language Joint Learning
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint LearningACM Multimedia (MM), 2024
Yaming Yang
Zhe Wang
Fuhai Chen
Ziyu Guan
Weigang Lu
Joemon M. Jose
CVBM
269
10
0
01 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional
  Generalization
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
200
0
0
01 Aug 2024
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided
  Attention for Enhanced Document-level Relation Extraction
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction
Yanxu Mao
Peipei Liu
Tiehan Cui
207
2
0
31 Jul 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
236
3
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual CuesEuropean Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
183
12
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
213
9
0
26 Jul 2024
Attention Beats Linear for Fast Implicit Neural Representation
  Generation
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
282
3
0
22 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
258
19
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
155
2
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text
  Generation: A State-of-the-Art Investigation
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
228
18
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible
  Collisions in Object Placement Tasks
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
231
2
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with
  Data-centric Explainable Edge AI Approach
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
279
17
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
199
8
0
15 Jul 2024
Predicting Winning Captions for Weekly New Yorker Comics
Predicting Winning Captions for Weekly New Yorker Comics
Stanley Cao
Sonny Young
ViTVLM
139
1
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Elisa Kreiss
403
2
0
10 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jinfa Huang
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
247
9
0
07 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
213
1
0
06 Jul 2024
Previous
12345...707172
Next