ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.4729
  4. Cited By
Translating Videos to Natural Language Using Deep Recurrent Neural
  Networks
v1v2v3 (latest)

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

North American Chapter of the Association for Computational Linguistics (NAACL), 2014
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
ArXiv (abs)PDFHTML

Papers citing "Translating Videos to Natural Language Using Deep Recurrent Neural Networks"

50 / 334 papers shown
Title
Object-aware Aggregation with Bidirectional Temporal Graph for Video
  Captioning
Object-aware Aggregation with Bidirectional Temporal Graph for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Junchao Zhang
Yuxin Peng
160
187
0
11 Jun 2019
Attention is all you need for Videos: Self-attention based Video
  Summarization using Universal Transformers
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Manjot Bilkhu
Siyang Wang
Tushar Dobhal
ViT
94
17
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
188
13
0
04 Jun 2019
Reconstruct and Represent Video Contents for Captioning via
  Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
157
72
0
03 Jun 2019
Memory-Attended Recurrent Network for Video Captioning
Memory-Attended Recurrent Network for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Wenjie Pei
Jiyuan Zhang
Xiangrong Wang
Lei Ke
Xiaoyong Shen
Yu-Wing Tai
199
223
0
10 May 2019
Differential Recurrent Neural Network and its Application for Human
  Activity Recognition
Differential Recurrent Neural Network and its Application for Human Activity Recognition
Naifan Zhuang
Guo-Jun Qi
T. Kieu
K. Hua
102
3
0
09 May 2019
Multimodal Semantic Attention Network for Video Captioning
Multimodal Semantic Attention Network for Video CaptioningIEEE International Conference on Multimedia and Expo (ICME), 2019
Liang Sun
Bing Li
Chunfen Yuan
Zhengjun Zha
Weiming Hu
162
11
0
08 May 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video
  Captioning
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2019
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
172
104
0
03 May 2019
End-to-End Spoken Language Translation
End-to-End Spoken Language Translation
Michelle Guo
Albert Haque
Prateek Verma
98
8
0
23 Apr 2019
Replay attack detection with complementary high-resolution information
  using end-to-end DNN for the ASVspoof 2019 Challenge
Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge
Jee-weon Jung
Hye-jin Shim
Hee-Soo Heo
Ha-Jin Yu
200
51
0
23 Apr 2019
Streamlined Dense Video Captioning
Streamlined Dense Video Captioning
Jonghwan Mun
L. Yang
Zhou Ren
N. Xu
Bohyung Han
227
159
1
08 Apr 2019
End-to-End Video Captioning
End-to-End Video Captioning
Silvio Olivastri
Gurkirt Singh
Fabio Cuzzolin
136
20
0
04 Apr 2019
Recurrent Back-Projection Network for Video Super-Resolution
Recurrent Back-Projection Network for Video Super-Resolution
Muhammad Haris
Gregory Shakhnarovich
Norimichi Ukita
SupR
157
472
0
25 Mar 2019
Scene Understanding for Autonomous Manipulation with Deep Learning
Scene Understanding for Autonomous Manipulation with Deep Learning
A. Nguyen
123
6
0
23 Mar 2019
V2CNet: A Deep Learning Framework to Translate Videos to Commands for
  Robotic Manipulation
V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation
A. Nguyen
Thanh-Toan Do
Ian Reid
D. Caldwell
Nikos G. Tsagarakis
119
21
0
23 Mar 2019
Practical Hidden Voice Attacks against Speech and Speaker Recognition
  Systems
Practical Hidden Voice Attacks against Speech and Speaker Recognition SystemsNetwork and Distributed System Security Symposium (NDSS), 2019
H. Abdullah
Washington Garcia
Christian Peeters
Patrick Traynor
Kevin R. B. Butler
Joseph N. Wilson
AAML
152
177
0
18 Mar 2019
M-VAD Names: a Dataset for Video Captioning with Naming
M-VAD Names: a Dataset for Video Captioning with NamingMultimedia tools and applications (MTA), 2018
S. Pini
Marcella Cornia
Federico Bolelli
Lorenzo Baraldi
Rita Cucchiara
152
29
0
04 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding
  for Video Captioning
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Nayyer Aafaq
Naveed Akhtar
Wen Liu
Syed Zulqarnain Gilani
Lin Wang
203
220
0
27 Feb 2019
Actions Generation from Captions
Actions Generation from Captions
Xuan Liang
Yida Xu
62
0
0
14 Feb 2019
Exploring Temporal Dependencies in Multimodal Referring Expressions with
  Mixed Reality
Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed Reality
E. Sibirtseva
Ali Ghadirzadeh
Iolanda Leite
Mårten Björkman
Danica Kragic
107
4
0
04 Feb 2019
Not All Words are Equal: Video-specific Information Loss for Video
  Captioning
Not All Words are Equal: Video-specific Information Loss for Video Captioning
Jiarong Dong
Ke Gao
Xiaokai Chen
Junbo Guo
Juan Cao
Yongdong Zhang
105
8
0
01 Jan 2019
Future semantic segmentation of time-lapsed videos with large temporal
  displacement
Future semantic segmentation of time-lapsed videos with large temporal displacement
Talha Ahmad Siddiqui
Samarth Bharadwaj
124
1
0
27 Dec 2018
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
153
231
0
26 Dec 2018
Context, Attention and Audio Feature Explorations for Audio Visual
  Scene-Aware Dialog
Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog
Shachi H. Kumar
Eda Okur
Saurav Sahay
Juan Jose Alvarado Leanos
Jonathan Huang
L. Nachman
96
10
0
20 Dec 2018
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
159
162
0
10 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
164
21
0
07 Dec 2018
A Coarse-to-fine Deep Convolutional Neural Network Framework for Frame
  Duplication Detection and Localization in Forged Videos
A Coarse-to-fine Deep Convolutional Neural Network Framework for Frame Duplication Detection and Localization in Forged Videos
Chengjiang Long
Arslan Basharat
A. Hoogs
136
7
0
27 Nov 2018
Deep RNN Framework for Visual Sequential Applications
Deep RNN Framework for Visual Sequential ApplicationsComputer Vision and Pattern Recognition (CVPR), 2018
Bo Pang
Kaiwen Zha
Hanwen Cao
Chen Shi
Cewu Lu
ViTHAI
262
54
0
25 Nov 2018
A Perceptual Prediction Framework for Self Supervised Event Segmentation
A Perceptual Prediction Framework for Self Supervised Event Segmentation
Sathyanarayanan N. Aakur
Sudeep Sarkar
202
78
0
12 Nov 2018
Imitation Learning for Object Manipulation Based on Position/Force
  Information Using Bilateral Control
Imitation Learning for Object Manipulation Based on Position/Force Information Using Bilateral ControlIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2018
Youru Li
Zhenfeng Zhu
S. Sakaino
Yao Zhao
172
53
0
09 Nov 2018
Middle-Out Decoding
Middle-Out Decoding
Shikib Mehri
Leonid Sigal
140
22
0
28 Oct 2018
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Shubham Agarwal
Ondrej Dusek
Ioannis Konstas
Verena Rieser
141
22
0
20 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDLAI4TS
173
204
0
16 Oct 2018
Learning to Globally Edit Images with Textual Description
Learning to Globally Edit Images with Textual Description
Hai Wang
Jason D. Williams
Sin-Han Kang
DiffM
131
18
0
13 Oct 2018
Semantic Sentence Embeddings for Paraphrasing and Text Summarization
Semantic Sentence Embeddings for Paraphrasing and Text SummarizationIEEE Global Conference on Signal and Information Processing (GlobalSIP), 2017
Chi Zhang
Shagan Sah
Thang Nguyen
D. Peri
A. Loui
C. Salvaggio
R. Ptucha
124
33
0
26 Sep 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for
  Video and Movie Description
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
Oliver A. Nina
Washington Garcia
Scott Clouse
Alper Yilmaz
149
4
0
19 Sep 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual
  Contexts
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Shuming Ma
Lei Cui
Damai Dai
Furu Wei
Xu Sun
VGen
162
64
0
13 Sep 2018
Hierarchical Video Understanding
Hierarchical Video Understanding
F. Mahdisoltani
Roland Memisevic
David Fleet
69
2
0
04 Sep 2018
A Survey of the Usages of Deep Learning in Natural Language Processing
A Survey of the Usages of Deep Learning in Natural Language Processing
Dan Otter
Julian R. Medina
Jugal Kalita
VLM
266
12
0
27 Jul 2018
Video Storytelling: Textual Summaries for Events
Video Storytelling: Textual Summaries for Events
Junnan Li
Yongkang Wong
Qi Zhao
Mohan Kankanhalli
DiffM
172
48
0
25 Jul 2018
Video Captioning with Boundary-aware Hierarchical Language Decoding and
  Joint Video Prediction
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
Xiangxi Shi
Jianfei Cai
Jiuxiang Gu
Shafiq Joty
106
19
0
08 Jul 2018
RUDDER: Return Decomposition for Delayed Rewards
RUDDER: Return Decomposition for Delayed Rewards
Jose A. Arjona-Medina
Michael Gillhofer
Michael Widrich
Thomas Unterthiner
Johannes Brandstetter
Sepp Hochreiter
285
241
0
20 Jun 2018
Mining for meaning: from vision to language through multiple networks
  consensus
Mining for meaning: from vision to language through multiple networks consensus
Iulia Duta
Andrei Liviu Nicolicioiu
Simion-Vlad Bogolin
Marius Leordeanu
130
3
0
05 Jun 2018
A Novel Framework for Recurrent Neural Networks with Enhancing
  Information Processing and Transmission between Units
A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units
Xi Chen
Zhihong Deng
Gehui Shen
Ting Huang
87
1
0
02 Jun 2018
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Huda AlAmri
Vincent Cartillier
Raphael Gontijo-Lopes
Abhishek Das
Jue Wang
...
Dhruv Batra
Devi Parikh
A. Cherian
Tim K. Marks
Chiori Hori
132
34
0
01 Jun 2018
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq
Lin Wang
Wen Liu
Syed Zulqarnain Gilani
Mubarak Shah
469
100
0
01 Jun 2018
Deep Reinforcement Learning For Sequence to Sequence Models
Deep Reinforcement Learning For Sequence to Sequence Models
Yaser Keneshloo
Tian Shi
Naren Ramakrishnan
Chandan K. Reddy
AIMat3DVOffRL
232
236
0
24 May 2018
Hierarchically Structured Reinforcement Learning for Topically Coherent
  Visual Story Generation
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Qiuyuan Huang
Zhe Gan
Asli Celikyilmaz
D. Wu
Jianfeng Wang
Xiaodong He
BDL
207
96
0
21 May 2018
On the effectiveness of task granularity for transfer learning
On the effectiveness of task granularity for transfer learning
F. Mahdisoltani
Guillaume Berger
W. Gharbieh
David Fleet
Roland Memisevic
134
67
0
24 Apr 2018
Jointly Localizing and Describing Events for Dense Video Captioning
Jointly Localizing and Describing Events for Dense Video Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
134
186
0
23 Apr 2018
Previous
1234567
Next