Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

20 December 2014

Yi Yang

Papers citing "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"

50 / 417 papers shown

Title
Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition Michael Wray Davide Moltisanti W. Mayol-Cuevas Dima Damen 25 2 0 24 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation Chenxi Liu Zhe-nan Lin Xiaohui Shen Jimei Yang Xin Lu Alan Yuille EgoV 36 234 0 23 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation Xiaodan Liang Zhiting Hu H. M. Zhang Chuang Gan Eric P. Xing GAN 19 200 0 21 Mar 2017
Person Search with Natural Language Description Shuang Li Tong Xiao Hongsheng Li Bolei Zhou Dayu Yue Xiaogang Wang 19 385 0 19 Feb 2017
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading Chunlin Tian Weijun Ji 22 7 0 16 Jan 2017
Attention-Based Multimodal Fusion for Video Description Chiori Hori Takaaki Hori Teng-Yok Lee Kazuhiro Sumi J. Hershey Tim K. Marks 27 359 0 11 Jan 2017
Learning Visual N-Grams from Web Data Ang Li Allan Jabri Armand Joulin L. V. D. van der Maaten VLM 12 136 0 29 Dec 2016
Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation Gwangbeen Park Woobin Im GAN 8 25 0 26 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task Nan Ding Sebastian Goodman Fei Sha Radu Soricut VLM 11 9 0 22 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 17 132 0 21 Dec 2016
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering Hao Liu Yang Yang Fumin Shen Lixin Duan Heng Tao Shen 30 9 0 15 Dec 2016
Text-guided Attention Model for Image Captioning Jonghwan Mun Minsu Cho Bohyung Han VLM 10 92 0 12 Dec 2016
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning Jiasen Lu Caiming Xiong Devi Parikh R. Socher 85 1,442 0 06 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 25 205 0 03 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 16 232 0 02 Dec 2016
Video Captioning with Multi-Faceted Attention Xiang Long Chuang Gan Gerard de Melo 19 88 0 01 Dec 2016
Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images Junhua Mao Jiajing Xu Yushi Jing Alan Yuille 11 48 0 24 Nov 2016
Semantic Compositional Networks for Visual Captioning Zhe Gan Chuang Gan Xiaodong He Yunchen Pu Kenneth Tran Jianfeng Gao Lawrence Carin Li Deng CoGe 36 425 0 23 Nov 2016
Learning Generic Sentence Representations Using Convolutional Neural Networks Zhe Gan Yunchen Pu Ricardo Henao Chunyuan Li Xiaodong He Lawrence Carin SSL 34 98 0 23 Nov 2016
Adaptive Feature Abstraction for Translating Video to Text Yunchen Pu Martin Renqiang Min Zhe Gan Lawrence Carin 29 14 0 23 Nov 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li-Jia Li VLM 19 169 0 21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs J. Krause Justin Johnson Ranjay Krishna Li Fei-Fei VLM 19 373 0 20 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning Long Chen Hanwang Zhang Jun Xiao Liqiang Nie Jian Shao Wei Liu Tat-Seng Chua 11 1,649 0 17 Nov 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 23 103 0 16 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching Hyeonseob Nam Jung-Woo Ha Jeonghee Kim 28 664 0 02 Nov 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges Kushal Kafle Christopher Kanan OOD 25 235 0 05 Oct 2016
A Survey of Multi-View Representation Learning Yingming Li Ming Yang Zhongfei Zhang AI4TS 3DV 22 509 0 03 Oct 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 11 848 0 21 Sep 2016
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion Ankit Gandhi Arjun Sharma Arijit Biswas Om Deshmukh AI4TS 19 12 0 17 Sep 2016
Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories Mark Harmon Abdolghani Ebrahimi P. Lucey Diego Klabjan GAN 9 18 0 15 Sep 2016
Multimodal Attention for Neural Machine Translation Ozan Caglayan Loïc Barrault Fethi Bougares 21 75 0 13 Sep 2016
Measuring Machine Intelligence Through Visual Question Answering C. L. Zitnick Aishwarya Agrawal Stanislaw Antol Margaret Mitchell Dhruv Batra Devi Parikh 14 37 0 31 Aug 2016
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions Ronghang Hu Marcus Rohrbach Subhashini Venugopalan Trevor Darrell VLM 17 18 0 30 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning Y. Tan Chee Seng Chan VLM 9 29 0 20 Aug 2016
Detecting Sarcasm in Multimodal Social Platforms Rossano Schifanella Paloma de Juan Joel R. Tetreault Liangliang Cao 15 167 0 08 Aug 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 28 1,223 0 31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 29 1,883 0 29 Jul 2016
A Comprehensive Survey on Cross-modal Retrieval K. Wang Qiyue Yin Wei Wang Shu Wu Liang Wang 26 294 0 21 Jul 2016
Visual Question Answering: A Survey of Methods and Datasets Qi Wu Damien Teney Peng Wang Chunhua Shen A. Dick A. Hengel 19 413 0 20 Jul 2016
Captioning Images with Diverse Objects Subhashini Venugopalan Lisa Anne Hendricks Marcus Rohrbach Raymond J. Mooney Trevor Darrell Kate Saenko VLM 22 178 0 24 Jun 2016
Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions F. Carrara Andrea Esuli T. Fagni Fabrizio Falchi Alejandro Moreo DiffM 14 31 0 23 Jun 2016
Watch What You Just Said: Image Captioning with Text-Conditional Attention Luowei Zhou Chenliang Xu Parker A. Koch Jason J. Corso VLM 8 44 0 15 Jun 2016
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation Jie Zhou Ying Cao Xuguang Wang Peng Li W. Xu AIMat 19 215 0 14 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 149 1,465 0 06 Jun 2016
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network Yu Liu Jianlong Fu Tao Mei C. Chen 11 4 0 02 Jun 2016
Attention Correctness in Neural Image Captioning Chenxi Liu Junhua Mao Fei Sha Alan Yuille 3DV 27 220 0 31 May 2016
SNN: Stacked Neural Networks Milad Mohammadi Subhasis Das 11 15 0 27 May 2016
Generative Adversarial Text to Image Synthesis Scott E. Reed Zeynep Akata Xinchen Yan Lajanugen Logeswaran Bernt Schiele Honglak Lee GAN 17 3,124 0 17 May 2016
Learning Deep Representations of Fine-grained Visual Descriptions Scott E. Reed Zeynep Akata Bernt Schiele Honglak Lee OCL VLM 165 840 0 17 May 2016
Movie Description Anna Rohrbach Atousa Torabi Marcus Rohrbach Niket Tandon C. Pal Hugo Larochelle Aaron Courville Bernt Schiele 3DV VGen 30 353 0 12 May 2016