ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.4729
  4. Cited By
Translating Videos to Natural Language Using Deep Recurrent Neural
  Networks
v1v2v3 (latest)

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

North American Chapter of the Association for Computational Linguistics (NAACL), 2014
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
ArXiv (abs)PDFHTML

Papers citing "Translating Videos to Natural Language Using Deep Recurrent Neural Networks"

50 / 334 papers shown
Title
Poet: Product-oriented Video Captioner for E-commerce
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Leilei Gan
128
36
0
16 Aug 2020
Vision Meets Wireless Positioning: Effective Person Re-identification
  with Recurrent Context Propagation
Vision Meets Wireless Positioning: Effective Person Re-identification with Recurrent Context PropagationACM Multimedia (ACM MM), 2020
Yiheng Liu
Wen-gang Zhou
Mao Xi
Sanjing Shen
Houqiang Li
182
10
0
10 Aug 2020
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual TextInternational Conference on Pattern Recognition (ICPR), 2020
Philipp Rimle
Pelin Dogan
Markus Gross
141
3
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in VideosEuropean Conference on Computer Vision (ECCV), 2020
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
199
111
0
28 Jul 2020
Fully Convolutional Networks for Continuous Sign Language Recognition
Fully Convolutional Networks for Continuous Sign Language RecognitionEuropean Conference on Computer Vision (ECCV), 2020
Ka Leong Cheng
Zhaoyang Yang
Qifeng Chen
Yu-Wing Tai
SLR
207
184
0
24 Jul 2020
Deep Learning Techniques for Future Intelligent Cross-Media Retrieval
Deep Learning Techniques for Future Intelligent Cross-Media Retrieval
S. Rehman
M. Waqas
Shanshan Tu
Anis Koubaa
O. Rehman
Jawad Ahmad
Muhammad Hanif
Zhu Han
118
7
0
21 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
167
61
0
05 Jul 2020
Listen carefully and tell: an audio captioning system based on residual
  learning and gammatone audio representation
Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation
Sergi Perez-Castanos
Javier Naranjo-Alcazar
P. Zuccarello
M. Cobos
152
12
0
27 Jun 2020
SACT: Self-Aware Multi-Space Feature Composition Transformer for
  Multinomial Attention for Video Captioning
SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning
C. Sur
120
7
0
25 Jun 2020
Comprehensive Information Integration Modeling Framework for Video
  Titling
Comprehensive Information Integration Modeling Framework for Video TitlingKnowledge Discovery and Data Mining (KDD), 2020
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Leilei Gan
146
41
0
24 Jun 2020
iSeeBetter: Spatio-temporal video super-resolution using recurrent
  generative back-projection networks
iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networksComputational Visual Media (CVM), 2020
Vasu Sharma
John Britto
M. Mani Roja
SupR
195
26
0
13 Jun 2020
NITS-VC System for VATEX Video Captioning Challenge 2020
NITS-VC System for VATEX Video Captioning Challenge 2020
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
122
16
0
07 Jun 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal
  Transformer
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
207
128
0
17 May 2020
Rethinking and Improving Natural Language Generation with Layer-Wise
  Multi-View Decoding
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
358
2
0
16 May 2020
Learning from Noisy Labels with Noise Modeling Network
Learning from Noisy Labels with Noise Modeling Network
Zhuolin Jiang
J. Silovský
M. Siu
William Hartmann
H. Gish
Sancar Adali
NoLa
71
3
0
01 May 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Spatio-Temporal Graph for Video Captioning with Knowledge DistillationComputer Vision and Pattern Recognition (CVPR), 2020
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
179
259
0
31 Mar 2020
Multi-modal Dense Video Captioning
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
265
198
0
17 Mar 2020
Video Caption Dataset for Describing Human Actions in Japanese
Video Caption Dataset for Describing Human Actions in JapaneseInternational Conference on Language Resources and Evaluation (LREC), 2020
Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
A. Takeuchi
68
3
0
10 Mar 2020
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail
  Enhancement
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement
Fangyi Zhu
Lei Li
Zhanyu Ma
Guang Chen
Jun Guo
160
1
0
08 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal
  Learning
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningAAAI Conference on Artificial Intelligence (AAAI), 2020
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
288
129
0
06 Mar 2020
Hierarchical Memory Decoding for Video Captioning
Hierarchical Memory Decoding for Video Captioning
Aming Wu
Yahong Han
123
2
0
27 Feb 2020
CLARA: Clinical Report Auto-completion
CLARA: Clinical Report Auto-completionThe Web Conference (WWW), 2020
Siddharth Biswal
Cao Xiao
Lucas Glass
M. P. M. Brandon Westover
Jimeng Sun
196
29
0
26 Feb 2020
Object Relational Graph with Teacher-Recommended Learning for Video
  Captioning
Object Relational Graph with Teacher-Recommended Learning for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2020
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
188
302
0
26 Feb 2020
Multimodal Matching Transformer for Live Commenting
Multimodal Matching Transformer for Live CommentingEuropean Conference on Artificial Intelligence (ECAI), 2020
Chaoqun Duan
Lei Cui
Shuming Ma
Furu Wei
Conghui Zhu
Tiejun Zhao
85
13
0
07 Feb 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
Spatio-Temporal Ranked-Attention Networks for Video CaptioningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
109
22
0
17 Jan 2020
Delving Deeper into the Decoder for Video Captioning
Delving Deeper into the Decoder for Video CaptioningEuropean Conference on Artificial Intelligence (ECAI), 2020
Haoran Chen
Jianmin Li
Xiaolin Hu
147
38
0
16 Jan 2020
Non-Autoregressive Coarse-to-Fine Video Captioning
Non-Autoregressive Coarse-to-Fine Video Captioning
Bang-ju Yang
Yuexian Zou
Fenglin Liu
Can Zhang
352
11
0
27 Nov 2019
Zero-Shot Imitating Collaborative Manipulation Plans from YouTube
  Cooking Videos
Zero-Shot Imitating Collaborative Manipulation Plans from YouTube Cooking Videos
Hejia Zhang
Jie Zhong
Stefanos Nikolaidis
LM&Ro
932
2
0
25 Nov 2019
Characterizing the impact of using features extracted from pre-trained
  models on the quality of video captioning sequence-to-sequence models
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence modelsInternational Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2019
Menatallh Hammad
May Hammad
Mohamed Elshenawy
81
2
0
22 Nov 2019
Empirical Autopsy of Deep Video Captioning Frameworks
Empirical Autopsy of Deep Video Captioning Frameworks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Lin Wang
115
6
0
21 Nov 2019
Crowd Video Captioning
Crowd Video Captioning
Liqi Yan
Mingjian Zhu
Changbin (Brad) Yu
76
4
0
13 Nov 2019
Video Captioning with Text-based Dynamic Attention and Step-by-Step
  Learning
Video Captioning with Text-based Dynamic Attention and Step-by-Step LearningPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
109
26
0
05 Nov 2019
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video
  Captioning
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Tao Jin
Siyu Huang
Yingming Li
Zhongfei Zhang
148
21
0
01 Nov 2019
Orchestrating the Development Lifecycle of Machine Learning-Based IoT
  Applications: A Taxonomy and Survey
Orchestrating the Development Lifecycle of Machine Learning-Based IoT Applications: A Taxonomy and Survey
Bin Qian
Jie Su
Z. Wen
D. N. Jha
Yinhao Li
...
Albert Y. Zomaya
Omer F. Rana
Lizhe Wang
Maciej Koutny
R. Ranjan
185
4
0
11 Oct 2019
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box
  Attacks on Speech Recognition and Voice Identification Systems
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems
H. Abdullah
Muhammad Sajidur Rahman
Washington Garcia
Logan Blue
Kevin Warren
Anurag Swarnim Yadav
T. Shrimpton
Patrick Traynor
AAML
129
95
0
11 Oct 2019
Explaining and Interpreting LSTMs
Explaining and Interpreting LSTMs
L. Arras
Jose A. Arjona-Medina
Michael Widrich
G. Montavon
Michael Gillhofer
K. Müller
Sepp Hochreiter
Wojciech Samek
FAttAI4TS
143
83
0
25 Sep 2019
Learning Actions from Human Demonstration Video for Robotic Manipulation
Learning Actions from Human Demonstration Video for Robotic ManipulationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2019
Shuo Yang
Wei Zhang
Weizhi Lu
Hesheng Wang
Yibin Li
90
26
0
10 Sep 2019
Time Series Motion Generation Considering Long Short-Term Motion
Time Series Motion Generation Considering Long Short-Term MotionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2019
K. Fujimoto
S. Sakaino
T. Tsuji
117
15
0
09 Sep 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkIEEE International Conference on Computer Vision (ICCV), 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
206
176
0
27 Aug 2019
Autonomous Learning for Face Recognition in the Wild via Ambient
  Wireless Cues
Autonomous Learning for Face Recognition in the Wild via Ambient Wireless CuesThe Web Conference (WWW), 2019
Chris Xiaoxuan Lu
Xuan Kan
Bowen Du
Changhao Chen
Hongkai Wen
Andrew Markham
A. Trigoni
John A. Stankovic
CVBM
126
7
0
14 Aug 2019
SF-Net: Structured Feature Network for Continuous Sign Language
  Recognition
SF-Net: Structured Feature Network for Continuous Sign Language Recognition
Zhaoyang Yang
Zhenmei Shi
Xiaoyong Shen
Yu-Wing Tai
SLR
111
71
0
04 Aug 2019
Prediction and Description of Near-Future Activities in Video
Prediction and Description of Near-Future Activities in VideoComputer Vision and Image Understanding (CVIU), 2019
T. Mahmud
Mohammad Billah
Mahmudul Hasan
Amit K. Roy-Chowdhury
283
17
0
02 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative ExpertsBritish Machine Vision Conference (BMVC), 2019
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
169
422
0
31 Jul 2019
Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based
  Mechanism for Videos
Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for VideosIEEE transactions on multimedia (IEEE TMM), 2019
Sebastian Agethen
Winston H. Hsu
HAI
125
29
0
30 Jul 2019
Learning Visual Actions Using Multiple Verb-Only Labels
Learning Visual Actions Using Multiple Verb-Only LabelsBritish Machine Vision Conference (BMVC), 2019
Michael Wray
Dima Damen
183
7
0
25 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
332
141
0
22 Jul 2019
Watch It Twice: Video Captioning with a Refocused Video Encoder
Watch It Twice: Video Captioning with a Refocused Video EncoderACM Multimedia (ACM MM), 2019
Xiangxi Shi
Jianfei Cai
Shafiq Joty
Jiuxiang Gu
134
28
0
21 Jul 2019
Structured Variational Inference in Unstable Gaussian Process State
  Space Models
Structured Variational Inference in Unstable Gaussian Process State Space Models
Silvan Melchior
Sebastian Curi
Felix Berkenkamp
Andreas Krause
261
4
0
16 Jul 2019
Video Question Generation via Cross-Modal Self-Attention Networks
  Learning
Video Question Generation via Cross-Modal Self-Attention Networks LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Yu-Siang Wang
Hung-Ting Su
Chen-Hsi Chang
Zhe-Yu Liu
Winston H. Hsu
135
12
0
05 Jul 2019
A Deep Decoder Structure Based on WordEmbedding Regression for An
  Encoder-Decoder Based Model for Image Captioning
A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning
A. Asadi
Reza Safabakhsh
66
3
0
26 Jun 2019
Previous
1234567
Next