ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.07712
  4. Cited By
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

26 October 2015
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
W. Xu
ArXivPDFHTML

Papers citing "Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks"

50 / 216 papers shown
Title
A Semantics-Assisted Video Captioning Model Trained with Scheduled
  Sampling
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
12
47
0
31 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
66
162
0
27 Aug 2019
Transferable Representation Learning in Vision-and-Language Navigation
Transferable Representation Learning in Vision-and-Language Navigation
Haoshuo Huang
Vihan Jain
Harsh Mehta
Alexander Ku
Gabriel Ilharco
Jason Baldridge
Eugene Ie
LM&Ro
15
85
0
09 Aug 2019
Prediction and Description of Near-Future Activities in Video
Prediction and Description of Near-Future Activities in Video
T. Mahmud
Mohammad Billah
Mahmudul Hasan
A. Roy-Chowdhury
10
16
0
02 Aug 2019
Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph
  Generation
Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation
Yadan Luo
Zi Huang
Zheng-Wei Zhang
Ziwei Wang
Jingjing Li
Yang Yang
22
40
0
01 Aug 2019
Learning Visual Actions Using Multiple Verb-Only Labels
Learning Visual Actions Using Multiple Verb-Only Labels
Michael Wray
Dima Damen
15
7
0
25 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
15
132
0
22 Jul 2019
Why Build an Assistant in Minecraft?
Why Build an Assistant in Minecraft?
Arthur Szlam
Jonathan Gray
Kavya Srinet
Yacine Jernite
Armand Joulin
...
Siddharth Goyal
Demi Guo
Dan Rothermel
C. L. Zitnick
Jason Weston
LLMAG
12
28
0
22 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
25
1,172
0
07 Jun 2019
Attention is all you need for Videos: Self-attention based Video
  Summarization using Universal Transformers
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Manjot Bilkhu
Siyang Wang
Tushar Dobhal
ViT
11
15
0
06 Jun 2019
Reconstruct and Represent Video Contents for Captioning via
  Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
13
66
0
03 Jun 2019
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain
Gabriel Ilharco
Alexander Ku
Ashish Vaswani
Eugene Ie
Jason Baldridge
LM&Ro
6
177
0
29 May 2019
Memory-Attended Recurrent Network for Video Captioning
Memory-Attended Recurrent Network for Video Captioning
Wenjie Pei
Jiyuan Zhang
Xiangrong Wang
Lei Ke
Xiaoyong Shen
Yu-Wing Tai
9
200
0
10 May 2019
Multimodal Semantic Attention Network for Video Captioning
Multimodal Semantic Attention Network for Video Captioning
Liang Sun
Bing Li
Chunfen Yuan
Zhengjun Zha
Weiming Hu
21
11
0
08 May 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video
  Captioning
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
11
103
0
03 May 2019
A Review of Modularization Techniques in Artificial Neural Networks
A Review of Modularization Techniques in Artificial Neural Networks
Mohammed Amer
Tomás Maul
13
80
0
29 Apr 2019
Knowing When to Stop: Evaluation and Verification of Conformity to
  Output-size Specifications
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
Chenglong Wang
Rudy Bunel
Krishnamurthy Dvijotham
Po-Sen Huang
Edward Grefenstette
Pushmeet Kohli
17
5
0
26 Apr 2019
Challenges and Prospects in Vision and Language Research
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
14
41
0
19 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
Streamlined Dense Video Captioning
Streamlined Dense Video Captioning
Jonghwan Mun
L. Yang
Zhou Ren
N. Xu
Bohyung Han
12
136
0
08 Apr 2019
End-to-End Video Captioning
End-to-End Video Captioning
Silvio Olivastri
Gurkirt Singh
Fabio Cuzzolin
11
18
0
04 Apr 2019
Recurrent Back-Projection Network for Video Super-Resolution
Recurrent Back-Projection Network for Video Super-Resolution
Muhammad Haris
Gregory Shakhnarovich
Norimichi Ukita
SupR
20
430
0
25 Mar 2019
Scene Understanding for Autonomous Manipulation with Deep Learning
Scene Understanding for Autonomous Manipulation with Deep Learning
A. Nguyen
6
6
0
23 Mar 2019
V2CNet: A Deep Learning Framework to Translate Videos to Commands for
  Robotic Manipulation
V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation
A. Nguyen
Thanh-Toan Do
Ian Reid
D. Caldwell
Nikos G. Tsagarakis
19
21
0
23 Mar 2019
Learning to Speak and Act in a Fantasy Text Adventure Game
Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek
Angela Fan
Siddharth Karamcheti
Saachi Jain
Samuel Humeau
Emily Dinan
Tim Rocktaschel
Douwe Kiela
Arthur Szlam
Jason Weston
LLMAG
18
204
0
07 Mar 2019
Discourse Parsing in Videos: A Multi-modal Appraoch
Discourse Parsing in Videos: A Multi-modal Appraoch
Arjun Reddy Akula
Song-Chun Zhu
14
1
0
06 Mar 2019
M-VAD Names: a Dataset for Video Captioning with Naming
M-VAD Names: a Dataset for Video Captioning with Naming
S. Pini
Marcella Cornia
Federico Bolelli
Lorenzo Baraldi
Rita Cucchiara
9
29
0
04 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding
  for Video Captioning
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Nayyer Aafaq
Naveed Akhtar
W. Liu
Syed Zulqarnain Gilani
Ajmal Saeed Mian
18
203
0
27 Feb 2019
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep
  Learning
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Youngeun Kwon
Minsoo Rhu
6
56
0
18 Feb 2019
Audio-Visual Scene-Aware Dialog
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
17
188
0
25 Jan 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
13
221
0
26 Dec 2018
Coupled Recurrent Network (CRN)
Coupled Recurrent Network (CRN)
Lin Sun
K. Jia
Yuejia Shen
Silvio Savarese
Dit-Yan Yeung
Bertram E. Shi
19
4
0
25 Dec 2018
Context, Attention and Audio Feature Explorations for Audio Visual
  Scene-Aware Dialog
Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog
Shachi H. Kumar
Eda Okur
Saurav Sahay
Juan Jose Alvarado Leanos
Jonathan Huang
L. Nachman
6
10
0
20 Dec 2018
Adversarial Inference for Multi-Sentence Video Description
Adversarial Inference for Multi-Sentence Video Description
J. S. Park
Marcus Rohrbach
Trevor Darrell
Anna Rohrbach
14
79
0
13 Dec 2018
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
17
148
0
10 Dec 2018
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning
  for Vision-Language Navigation
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Xin Eric Wang
Qiuyuan Huang
Asli Celikyilmaz
Jianfeng Gao
Dinghan Shen
Yuan-fang Wang
William Yang Wang
Lei Zhang
LM&Ro
SSL
12
528
0
25 Nov 2018
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video
  Captioning
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
Yoonchang Sung
Jiawei Wu
Da Zhang
Yu-Chuan Su
Pratap Tokekar
16
39
0
07 Nov 2018
Middle-Out Decoding
Middle-Out Decoding
Shikib Mehri
Leonid Sigal
8
22
0
28 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDL
AI4TS
4
187
0
16 Oct 2018
Semantic Sentence Embeddings for Paraphrasing and Text Summarization
Semantic Sentence Embeddings for Paraphrasing and Text Summarization
Chi Zhang
Shagan Sah
Thang Nguyen
D. Peri
A. Loui
C. Salvaggio
R. Ptucha
16
31
0
26 Sep 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for
  Video and Movie Description
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
Oliver A. Nina
Washington Garcia
Scott Clouse
Alper Yilmaz
14
4
0
19 Sep 2018
Diverse and Coherent Paragraph Generation from Images
Diverse and Coherent Paragraph Generation from Images
Moitreya Chatterjee
A. Schwing
11
66
0
03 Sep 2018
A Survey of the Usages of Deep Learning in Natural Language Processing
A Survey of the Usages of Deep Learning in Natural Language Processing
Dan Otter
Julian R. Medina
Jugal Kalita
VLM
19
11
0
27 Jul 2018
Move Forward and Tell: A Progressive Generator of Video Descriptions
Move Forward and Tell: A Progressive Generator of Video Descriptions
Yilei Xiong
Bo Dai
Dahua Lin
13
101
0
26 Jul 2018
Video Storytelling: Textual Summaries for Events
Video Storytelling: Textual Summaries for Events
Junnan Li
Yongkang Wong
Qi Zhao
Mohan S. Kankanhalli
DiffM
8
44
0
25 Jul 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
G. Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
16
125
0
21 Jun 2018
Mining for meaning: from vision to language through multiple networks
  consensus
Mining for meaning: from vision to language through multiple networks consensus
Iulia Duta
Andrei Liviu Nicolicioiu
Simion-Vlad Bogolin
Marius Leordeanu
8
3
0
05 Jun 2018
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq
Ajmal Saeed Mian
W. Liu
Syed Zulqarnain Gilani
Mubarak Shah
6
91
0
01 Jun 2018
Hierarchically Structured Reinforcement Learning for Topically Coherent
  Visual Story Generation
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Qiuyuan Huang
Zhe Gan
Asli Celikyilmaz
D. Wu
Jianfeng Wang
Xiaodong He
BDL
11
91
0
21 May 2018
ECO: Efficient Convolutional Network for Online Video Understanding
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
119
496
0
24 Apr 2018
Previous
12345
Next