ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.6632
  4. Cited By
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

20 December 2014
Junhua Mao
W. Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
    VLM
ArXivPDFHTML

Papers citing "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"

50 / 417 papers shown
Title
Improving Classification by Improving Labelling: Introducing
  Probabilistic Multi-Label Object Interaction Recognition
Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition
Michael Wray
Davide Moltisanti
W. Mayol-Cuevas
Dima Damen
25
2
0
24 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang
Zhiting Hu
H. M. Zhang
Chuang Gan
Eric P. Xing
GAN
19
200
0
21 Mar 2017
Person Search with Natural Language Description
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
19
385
0
19 Feb 2017
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and
  Lipreading
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading
Chunlin Tian
Weijun Ji
22
7
0
16 Jan 2017
Attention-Based Multimodal Fusion for Video Description
Attention-Based Multimodal Fusion for Video Description
Chiori Hori
Takaaki Hori
Teng-Yok Lee
Kazuhiro Sumi
J. Hershey
Tim K. Marks
27
359
0
11 Jan 2017
Learning Visual N-Grams from Web Data
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
L. V. D. van der Maaten
VLM
12
136
0
29 Dec 2016
Image-Text Multi-Modal Representation Learning by Adversarial
  Backpropagation
Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation
Gwangbeen Park
Woobin Im
GAN
8
25
0
26 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language
  Machine Comprehension Task
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Nan Ding
Sebastian Goodman
Fei Sha
Radu Soricut
VLM
11
9
0
22 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
17
132
0
21 Dec 2016
Recurrent Image Captioner: Describing Images with Spatial-Invariant
  Transformation and Attention Filtering
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering
Hao Liu
Yang Yang
Fumin Shen
Lixin Duan
Heng Tao Shen
30
9
0
15 Dec 2016
Text-guided Attention Model for Image Captioning
Text-guided Attention Model for Image Captioning
Jonghwan Mun
Minsu Cho
Bohyung Han
VLM
10
92
0
12 Dec 2016
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image
  Captioning
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
85
1,442
0
06 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
25
205
0
03 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
16
232
0
02 Dec 2016
Video Captioning with Multi-Faceted Attention
Video Captioning with Multi-Faceted Attention
Xiang Long
Chuang Gan
Gerard de Melo
19
88
0
01 Dec 2016
Training and Evaluating Multimodal Word Embeddings with Large-scale Web
  Annotated Images
Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images
Junhua Mao
Jiajing Xu
Yushi Jing
Alan Yuille
11
48
0
24 Nov 2016
Semantic Compositional Networks for Visual Captioning
Semantic Compositional Networks for Visual Captioning
Zhe Gan
Chuang Gan
Xiaodong He
Yunchen Pu
Kenneth Tran
Jianfeng Gao
Lawrence Carin
Li Deng
CoGe
36
425
0
23 Nov 2016
Learning Generic Sentence Representations Using Convolutional Neural
  Networks
Learning Generic Sentence Representations Using Convolutional Neural Networks
Zhe Gan
Yunchen Pu
Ricardo Henao
Chunyuan Li
Xiaodong He
Lawrence Carin
SSL
34
98
0
23 Nov 2016
Adaptive Feature Abstraction for Translating Video to Text
Adaptive Feature Abstraction for Translating Video to Text
Yunchen Pu
Martin Renqiang Min
Zhe Gan
Lawrence Carin
29
14
0
23 Nov 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
19
169
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
19
373
0
20 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks
  for Image Captioning
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Long Chen
Hanwang Zhang
Jun Xiao
Liqiang Nie
Jian Shao
Wei Liu
Tat-Seng Chua
11
1,649
0
17 Nov 2016
Semantic Regularisation for Recurrent Image Annotation
Semantic Regularisation for Recurrent Image Annotation
Feng Liu
Tao Xiang
Timothy M. Hospedales
Wankou Yang
Changyin Sun
23
103
0
16 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
28
664
0
02 Nov 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
25
235
0
05 Oct 2016
A Survey of Multi-View Representation Learning
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
22
509
0
03 Oct 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
  Challenge
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
11
848
0
21 Sep 2016
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for
  Multimodal Information Fusion
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion
Ankit Gandhi
Arjun Sharma
Arijit Biswas
Om Deshmukh
AI4TS
19
12
0
17 Sep 2016
Predicting Shot Making in Basketball Learnt from Adversarial Multiagent
  Trajectories
Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories
Mark Harmon
Abdolghani Ebrahimi
P. Lucey
Diego Klabjan
GAN
9
18
0
15 Sep 2016
Multimodal Attention for Neural Machine Translation
Multimodal Attention for Neural Machine Translation
Ozan Caglayan
Loïc Barrault
Fethi Bougares
21
75
0
13 Sep 2016
Measuring Machine Intelligence Through Visual Question Answering
Measuring Machine Intelligence Through Visual Question Answering
C. L. Zitnick
Aishwarya Agrawal
Stanislaw Antol
Margaret Mitchell
Dhruv Batra
Devi Parikh
14
37
0
31 Aug 2016
Utilizing Large Scale Vision and Text Datasets for Image Segmentation
  from Referring Expressions
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions
Ronghang Hu
Marcus Rohrbach
Subhashini Venugopalan
Trevor Darrell
VLM
17
18
0
30 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
9
29
0
20 Aug 2016
Detecting Sarcasm in Multimodal Social Platforms
Detecting Sarcasm in Multimodal Social Platforms
Rossano Schifanella
Paloma de Juan
Joel R. Tetreault
Liangliang Cao
15
167
0
08 Aug 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
28
1,223
0
31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
29
1,883
0
29 Jul 2016
A Comprehensive Survey on Cross-modal Retrieval
A Comprehensive Survey on Cross-modal Retrieval
K. Wang
Qiyue Yin
Wei Wang
Shu Wu
Liang Wang
26
294
0
21 Jul 2016
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering: A Survey of Methods and Datasets
Qi Wu
Damien Teney
Peng Wang
Chunhua Shen
A. Dick
A. Hengel
19
413
0
20 Jul 2016
Captioning Images with Diverse Objects
Captioning Images with Diverse Objects
Subhashini Venugopalan
Lisa Anne Hendricks
Marcus Rohrbach
Raymond J. Mooney
Trevor Darrell
Kate Saenko
VLM
22
178
0
24 Jun 2016
Picture It In Your Mind: Generating High Level Visual Representations
  From Textual Descriptions
Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
F. Carrara
Andrea Esuli
T. Fagni
Fabrizio Falchi
Alejandro Moreo
DiffM
14
31
0
23 Jun 2016
Watch What You Just Said: Image Captioning with Text-Conditional
  Attention
Watch What You Just Said: Image Captioning with Text-Conditional Attention
Luowei Zhou
Chenliang Xu
Parker A. Koch
Jason J. Corso
VLM
8
44
0
15 Jun 2016
Deep Recurrent Models with Fast-Forward Connections for Neural Machine
  Translation
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Jie Zhou
Ying Cao
Xuguang Wang
Peng Li
W. Xu
AIMat
19
215
0
14 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
149
1,465
0
06 Jun 2016
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent
  Neural Network
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network
Yu Liu
Jianlong Fu
Tao Mei
C. Chen
11
4
0
02 Jun 2016
Attention Correctness in Neural Image Captioning
Attention Correctness in Neural Image Captioning
Chenxi Liu
Junhua Mao
Fei Sha
Alan Yuille
3DV
27
220
0
31 May 2016
SNN: Stacked Neural Networks
SNN: Stacked Neural Networks
Milad Mohammadi
Subhasis Das
11
15
0
27 May 2016
Generative Adversarial Text to Image Synthesis
Generative Adversarial Text to Image Synthesis
Scott E. Reed
Zeynep Akata
Xinchen Yan
Lajanugen Logeswaran
Bernt Schiele
Honglak Lee
GAN
17
3,124
0
17 May 2016
Learning Deep Representations of Fine-grained Visual Descriptions
Learning Deep Representations of Fine-grained Visual Descriptions
Scott E. Reed
Zeynep Akata
Bernt Schiele
Honglak Lee
OCL
VLM
165
840
0
17 May 2016
Movie Description
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
30
353
0
12 May 2016
Previous
123456789
Next