Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,508 papers shown

Title
Video Captioning with Multi-Faceted Attention Xiang Long Chuang Gan Gerard de Melo 22 88 0 01 Dec 2016
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures Gaurav Mittal Tanya Marwah V. Balasubramanian VGen DiffM 38 67 0 30 Nov 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks Ronghang Hu Marcus Rohrbach Jacob Andreas Trevor Darrell Kate Saenko 31 401 0 30 Nov 2016
Attend in groups: a weakly-supervised deep learning framework for learning from web data Bohan Zhuang Lingqiao Liu Yao Li Chunhua Shen Ian Reid NoLa 16 89 0 30 Nov 2016
Context-aware Natural Language Generation with Recurrent Neural Networks Jian Tang Yifan Yang Samuel Carton Ming Zhang Qiaozhu Mei 19 67 0 29 Nov 2016
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model Marcella Cornia Lorenzo Baraldi G. Serra Rita Cucchiara 23 548 0 29 Nov 2016
Deep Quantization: Encoding Convolutional Activations with Deep Generative Model Zhaofan Qiu Ting Yao Tao Mei DRL MQ 24 58 0 29 Nov 2016
Emergence of foveal image sampling from learning to attend in visual scenes Brian Cheung E. Weiss Bruno A. Olshausen 12 38 0 28 Nov 2016
Hierarchical Boundary-Aware Neural Encoder for Video Captioning Lorenzo Baraldi C. Grana Rita Cucchiara 26 191 0 28 Nov 2016
Attention-based Memory Selection Recurrent Network for Language Modeling Da-Rong Liu Shun-Po Chuang Hung-yi Lee RALM KELM 35 5 0 26 Nov 2016
Neural Machine Translation with Latent Semantic of Image and Text Joji Toyama Masanori Misono Masahiro Suzuki Kotaro Nakayama Y. Matsuo 7 14 0 25 Nov 2016
An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning G. Zhong Lina Wang Junyu Dong AI4TS 26 180 0 25 Nov 2016
Semantic Compositional Networks for Visual Captioning Zhe Gan Chuang Gan Xiaodong He Yunchen Pu Kenneth Tran Jianfeng Gao Lawrence Carin Li Deng CoGe 42 425 0 23 Nov 2016
GuessWhat?! Visual object discovery through multi-modal dialogue H. D. Vries Florian Strub A. Chandar Olivier Pietquin Hugo Larochelle Aaron Courville VLM 30 425 0 23 Nov 2016
Adaptive Feature Abstraction for Translating Video to Text Yunchen Pu Martin Renqiang Min Zhe Gan Lawrence Carin 36 14 0 23 Nov 2016
Recurrent Attention Models for Depth-Based Person Identification Albert Haque Alexandre Alahi Li Fei-Fei 3DH 28 142 0 22 Nov 2016
GRAM: Graph-based Attention Model for Healthcare Representation Learning E. Choi M. T. Bahadori Le Song Walter F. Stewart Jimeng Sun GNN 16 662 0 21 Nov 2016
Coherent Dialogue with Attention-based Language Models Hongyuan Mei Mohit Bansal Matthew R. Walter AuLLM 25 83 0 21 Nov 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li-Jia Li VLM 19 169 0 21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues Bryan A. Plummer Arun Mallya Christopher M. Cervantes J. Hockenmaier Svetlana Lazebnik 25 189 0 21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs J. Krause Justin Johnson Ranjay Krishna Li Fei-Fei VLM 25 373 0 20 Nov 2016
Recurrent Memory Addressing for describing videos A. Jain Abhinav Agarwalla Kumar Krishna Agrawal Pabitra Mitra 30 10 0 20 Nov 2016
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data Sijie Song Cuiling Lan Junliang Xing Wenjun Zeng Jiaying Liu 27 977 0 18 Nov 2016
Cross Domain Knowledge Transfer for Person Re-identification Qiqi Xiao Kelei Cao Haonan Chen Fangyue Peng Chi Zhang 28 18 0 18 Nov 2016
AutoScaler: Scale-Attention Networks for Visual Correspondence Shenlong Wang Linjie Luo Ning Zhang Jia Li 12 19 0 17 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning Long Chen Hanwang Zhang Jun Xiao Liqiang Nie Jian Shao Wei Liu Tat-Seng Chua 13 1,649 0 17 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM Yan Huang Wei Wang Liang Wang 24 222 0 17 Nov 2016
DelugeNets: Deep Networks with Efficient and Flexible Cross-layer Information Inflows Jason Kuen Xiangfei Kong G. Wang Yap-Peng Tan 10 14 0 17 Nov 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 29 103 0 16 Nov 2016
A Semi-supervised Framework for Image Captioning Wenhu Chen Aurélien Lucchi Thomas Hofmann 29 9 0 16 Nov 2016
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives Mohit Iyyer Varun Manjunatha Anupam Guha Yogarshi Vyas Jordan L. Boyd-Graber Hal Daumé L. Davis 22 95 0 16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction Yilin Song J. Viventi Yao Wang AI4TS 30 2 0 15 Nov 2016
Hierarchical Object Detection with Deep Reinforcement Learning Míriam Bellver Xavier Giró-i-Nieto F. Marqués Jordi Torres 13 104 0 11 Nov 2016
Getting Started with Neural Models for Semantic Matching in Web Search Kezban Dilek Onal I. S. Altingövde Pinar Senkul Maarten de Rijke VLM 3DV 16 9 0 08 Nov 2016
Memory-augmented Attention Modelling for Videos Rasool Fakoor Abdel-rahman Mohamed Margaret Mitchell S. B. Kang Pushmeet Kohli 35 20 0 07 Nov 2016
Latent Attention For If-Then Program Synthesis Xinyun Chen Chang-rui Liu E. C. Shin D. Song Mingcheng Chen 22 70 0 07 Nov 2016
Hierarchical Question Answering for Long Documents Eunsol Choi D. Hewlett Alexandre Lacoste Illia Polosukhin Jakob Uszkoreit Jonathan Berant RALM 25 168 0 06 Nov 2016
Boosting Image Captioning with Attributes Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu Tao Mei VLM 31 620 0 05 Nov 2016
Categorical Reparameterization with Gumbel-Softmax Eric Jang S. Gu Ben Poole BDL 84 5,283 0 03 Nov 2016
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables Chris J. Maddison A. Mnih Yee Whye Teh BDL 18 2,504 0 02 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching Hyeonseob Nam Jung-Woo Ha Jeonghee Kim 34 664 0 02 Nov 2016
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences Daniel Neil Michael Pfeiffer Shih-Chii Liu AI4TS 25 442 0 29 Oct 2016
Professor Forcing: A New Algorithm for Training Recurrent Networks Alex Lamb Anirudh Goyal Ying Zhang Saizheng Zhang Aaron Courville Yoshua Bengio GAN 45 588 0 27 Oct 2016
Cross-Modal Scene Networks Y. Aytar Lluis Castrejon Carl Vondrick Hamed Pirsiavash Antonio Torralba SSL 14 114 0 27 Oct 2016
Can Active Memory Replace Attention? Lukasz Kaiser Samy Bengio 25 58 0 27 Oct 2016
Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models Shubham Toshniwal Karen Livescu 20 41 0 20 Oct 2016
Lexicon Integrated CNN Models with Attention for Sentiment Analysis Bonggun Shin Timothy Lee Jinho D. Choi 17 113 0 20 Oct 2016
Using Fast Weights to Attend to the Recent Past Jimmy Ba Geoffrey E. Hinton Volodymyr Mnih Joel Z. Leibo Catalin Ionescu 4 262 0 20 Oct 2016
Learning Robust Video Synchronization without Annotations P. Wieschollek Ido Freeman Hendrik P. A. Lensch 9 7 0 19 Oct 2016
Spatio-Temporal Attention Models for Grounded Video Captioning M. Zanfir Elisabeta Marinoiu C. Sminchisescu 27 50 0 17 Oct 2016