ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Dong Wang
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,578 papers shown
Title
Learning Deep Structure-Preserving Image-Text Embeddings
Learning Deep Structure-Preserving Image-Text Embeddings
Liwei Wang
Yin Li
Svetlana Lazebnik
360
814
0
19 Nov 2015
Active Object Localization with Deep Reinforcement Learning
Active Object Localization with Deep Reinforcement Learning
Juan C. Caicedo
Svetlana Lazebnik
ObjD
192
454
0
18 Nov 2015
ABC-CNN: An Attention Based Convolutional Neural Network for Visual
  Question Answering
ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
Kan Chen
Jiang Wang
Liang-Chieh Chen
Haoyuan Gao
Wenyuan Xu
Ram Nevatia
252
298
0
18 Nov 2015
ACDC: A Structured Efficient Linear Layer
ACDC: A Structured Efficient Linear Layer
Marcin Moczulski
Misha Denil
J. Appleyard
Nando de Freitas
498
103
0
18 Nov 2015
Image Question Answering using Convolutional Neural Network with Dynamic
  Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Hyeonwoo Noh
Paul Hongsuck Seo
Bohyung Han
OOD
166
334
0
18 Nov 2015
Compositional Memory for Visual Question Answering
Compositional Memory for Visual Question Answering
Aiwen Jiang
Fang Wang
Fatih Porikli
Yi Li
CoGe
101
43
0
18 Nov 2015
Learning Articulated Motion Models from Visual and Lingual Signals
Learning Articulated Motion Models from Visual and Lingual Signals
Zhengyang Wu
Joey Tianyi Zhou
Matthew R. Walter
95
0
0
17 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for
  Visual Question Answering
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
Huijuan Xu
Kate Saenko
378
780
0
17 Nov 2015
Yin and Yang: Balancing and Answering Binary Visual Questions
Yin and Yang: Balancing and Answering Binary Visual Questions
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
421
364
0
16 Nov 2015
Sherlock: Scalable Fact Learning in Images
Sherlock: Scalable Fact Learning in Images
Mohamed Elhoseiny
Scott D. Cohen
W. Chang
Brian L. Price
Ahmed Elgammal
202
26
0
16 Nov 2015
Neural Programmer: Inducing Latent Programs with Gradient Descent
Neural Programmer: Inducing Latent Programs with Gradient Descent
Arvind Neelakantan
Quoc V. Le
Ilya Sutskever
ODL
265
267
0
16 Nov 2015
Uncovering Temporal Context for Video Question and Answering
Uncovering Temporal Context for Video Question and Answering
Linchao Zhu
Zhongwen Xu
Yi Yang
Alexander G. Hauptmann
BDL
149
45
0
15 Nov 2015
Oracle performance for visual captioning
Oracle performance for visual captioning
Weitong Chen
Nicolas Ballas
Dong Wang
John R. Smith
Yoshua Bengio
VLM
399
9
0
14 Nov 2015
Reversible Recursive Instance-level Object Segmentation
Reversible Recursive Instance-level Object Segmentation
Xiaodan Liang
Yunchao Wei
Xiaohui Shen
Zequn Jie
Jiashi Feng
Liang Lin
Shuicheng Yan
SSegISeg
108
60
0
14 Nov 2015
Semantic Object Parsing with Local-Global Long Short-Term Memory
Semantic Object Parsing with Local-Global Long Short-Term Memory
Xiaodan Liang
Xiaohui Shen
Donglai Xiang
Jiashi Feng
Liang Lin
Shuicheng Yan
199
187
0
14 Nov 2015
Natural Language Object Retrieval
Natural Language Object Retrieval
Ronghang Hu
Huazhe Xu
Marcus Rohrbach
Jiashi Feng
Kate Saenko
Trevor Darrell
ObjD
297
569
0
13 Nov 2015
Action Recognition using Visual Attention
Action Recognition using Visual Attention
Shikhar Sharma
Ryan Kiros
Ruslan Salakhutdinov
293
677
0
12 Nov 2015
Deep Gaussian Conditional Random Field Network: A Model-based Deep
  Network for Discriminative Denoising
Deep Gaussian Conditional Random Field Network: A Model-based Deep Network for Discriminative Denoising
Raviteja Vemulapalli
Oncel Tuzel
Ming-Yuan Liu
131
71
0
12 Nov 2015
Hand-Object Interaction and Precise Localization in Transitive Action
  Recognition
Hand-Object Interaction and Precise Localization in Transitive Action Recognition
Amir Rosenfeld
S. Ullman
199
8
0
12 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction
Grounding of Textual Phrases in Images by Reconstruction
Anna Rohrbach
Marcus Rohrbach
Ronghang Hu
Trevor Darrell
Bernt Schiele
343
510
0
12 Nov 2015
Generative Concatenative Nets Jointly Learn to Write and Classify
  Reviews
Generative Concatenative Nets Jointly Learn to Write and Classify Reviews
Zachary Chase Lipton
Sharad Vikram
Julian McAuley
BDL
270
33
0
11 Nov 2015
Visual7W: Grounded Question Answering in Images
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
476
957
0
11 Nov 2015
Attention to Scale: Scale-aware Semantic Image Segmentation
Attention to Scale: Scale-aware Semantic Image Segmentation
Liang-Chieh Chen
Yi Yang
Jiang Wang
Wei Xu
Alan Yuille
SSeg
261
1,368
0
10 Nov 2015
Detecting events and key actors in multi-person videos
Detecting events and key actors in multi-person videos
Vignesh Ramanathan
Jonathan Huang
Sami Abu-El-Haija
Alexander N. Gorban
Kevin Patrick Murphy
Li Fei-Fei
202
215
0
09 Nov 2015
Neural Module Networks
Neural Module Networks
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Dan Klein
CoGe
537
1,126
0
09 Nov 2015
Generating Images from Captions with Attention
Generating Images from Captions with Attention
Elman Mansimov
Emilio Parisotto
Jimmy Lei Ba
Ruslan Salakhutdinov
VLM
240
480
0
09 Nov 2015
Explicit Knowledge-based Reasoning for Visual Question Answering
Explicit Knowledge-based Reasoning for Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
A. Dick
201
281
0
09 Nov 2015
The Goldilocks Principle: Reading Children's Books with Explicit Memory
  Representations
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
Felix Hill
Antoine Bordes
S. Chopra
Jason Weston
RALM
593
645
0
07 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
636
1,547
0
07 Nov 2015
Stacked Attention Networks for Image Question Answering
Stacked Attention Networks for Image Question Answering
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
BDL
398
1,971
0
07 Nov 2015
Deep Kernel Learning
Deep Kernel Learning
A. Wilson
Zhiting Hu
Ruslan Salakhutdinov
Eric Xing
BDL
514
981
0
06 Nov 2015
RATM: Recurrent Attentive Tracking Model
RATM: Recurrent Attentive Tracking Model
Samira Ebrahimi Kahou
Vincent Michalski
Roland Memisevic
268
85
0
29 Oct 2015
On End-to-End Program Generation from User Intention by Deep Neural
  Networks
On End-to-End Program Generation from User Intention by Deep Neural Networks
Lili Mou
Rui Men
Ge Li
Jun Liu
Zhi Jin
120
47
0
25 Oct 2015
Generic decoding of seen and imagined objects using hierarchical visual
  features
Generic decoding of seen and imagined objects using hierarchical visual features
T. Horikawa
Y. Kamitani
163
509
0
22 Oct 2015
Multilingual Image Description with Neural Sequence Models
Multilingual Image Description with Neural Sequence Models
Desmond Elliott
Stella Frank
Eva Hasler
VLM
250
77
0
15 Oct 2015
A Diversity-Promoting Objective Function for Neural Conversation Models
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li
Michel Galley
Chris Brockett
Jianfeng Gao
W. Dolan
409
2,556
0
11 Oct 2015
SentiCap: Generating Image Descriptions with Sentiments
SentiCap: Generating Image Descriptions with Sentiments
A. Mathews
Lexing Xie
Xuming He
207
233
0
06 Oct 2015
Learning Wake-Sleep Recurrent Attention Models
Learning Wake-Sleep Recurrent Attention Models
Jimmy Ba
Roger C. Grosse
Ruslan Salakhutdinov
B. Frey
BDL
162
65
0
22 Sep 2015
Reasoning about Entailment with Neural Attention
Reasoning about Entailment with Neural Attention
Tim Rocktaschel
Edward Grefenstette
Karl Moritz Hermann
Tomás Kociský
Phil Blunsom
NAI
212
772
0
22 Sep 2015
Recurrent Spatial Transformer Networks
Recurrent Spatial Transformer Networks
Søren Kaae Sønderby
C. Sønderby
Lars Maaløe
Ole Winther
ViT
150
51
0
17 Sep 2015
Guiding Long-Short Term Memory for Image Caption Generation
Guiding Long-Short Term Memory for Image Caption Generation
Xu Jia
E. Gavves
Basura Fernando
Tinne Tuytelaars
VLM
132
102
0
16 Sep 2015
What to talk about and how? Selective Generation using LSTMs with
  Coarse-to-Fine Alignment
What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine AlignmentNorth American Chapter of the Association for Computational Linguistics (NAACL), 2015
Hongyuan Mei
Joey Tianyi Zhou
Matthew R. Walter
190
292
0
02 Sep 2015
End-to-End Attention-based Large Vocabulary Speech Recognition
End-to-End Attention-based Large Vocabulary Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
Dzmitry Bahdanau
J. Chorowski
Dmitriy Serdyuk
Philemon Brakel
Yoshua Bengio
311
1,186
0
18 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2015
Thang Luong
Hieu H. Pham
Christopher D. Manning
1.3K
8,242
0
17 Aug 2015
Listen, Attend and Spell
Listen, Attend and SpellIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
RALM
399
2,368
0
05 Aug 2015
Artificial Neural Networks Applied to Taxi Destination Prediction
Artificial Neural Networks Applied to Taxi Destination Prediction
A. D. Brébisson
Étienne Simon
Alex Auvolat
Pascal Vincent
Yoshua Bengio
197
194
0
31 Jul 2015
Every Moment Counts: Dense Detailed Labeling of Actions in Complex
  Videos
Every Moment Counts: Dense Detailed Labeling of Actions in Complex VideosInternational Journal of Computer Vision (IJCV), 2015
Serena Yeung
Olga Russakovsky
Ning Jin
Mykhaylo Andriluka
Greg Mori
Li Fei-Fei
VLM
423
452
0
21 Jul 2015
Describing Multimedia Content using Attention-based Encoder--Decoder
  Networks
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Dong Wang
Aaron Courville
Yoshua Bengio
184
432
0
04 Jul 2015
Attention-Based Models for Speech Recognition
Attention-Based Models for Speech Recognition
J. Chorowski
Dzmitry Bahdanau
Dmitriy Serdyuk
Dong Wang
Yoshua Bengio
346
2,702
0
24 Jun 2015
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
A. Kumar
Ozan Irsoy
Peter Ondruska
Mohit Iyyer
James Bradbury
Ishaan Gulrajani
Victor Zhong
Romain Paulus
R. Socher
442
1,209
0
24 Jun 2015
Previous
123...707172
Next