Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.07712
Cited By
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
26 October 2015
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
W. Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks"
50 / 216 papers shown
Title
Semantic Grouping Network for Video Captioning
Hobin Ryu
Sunghun Kang
Haeyong Kang
Chang-Dong Yoo
22
134
0
01 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
210
479
0
21 Jan 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
15
8
0
13 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
4
5
0
30 Nov 2020
UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information
S. Girisha
Ujjwal Verma
M. Pai
R. Pai
14
38
0
29 Nov 2020
Video SemNet: Memory-Augmented Video Semantic Network
Prashanth Vijayaraghavan
D. Roy
14
0
0
22 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
24
3
0
18 Nov 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
38
30
0
20 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
18
6
0
19 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
6
18
0
14 Oct 2020
Diagnosing and Preventing Instabilities in Recurrent Video Processing
T. Tanay
Aivar Sootla
Matteo Maggioni
P. Dokania
Philip H. S. Torr
A. Leonardis
Greg Slabaugh
11
7
0
10 Oct 2020
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
21
2
0
02 Sep 2020
In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan
Tianhong Li
Yuan. Yuan
Dina Katabi
22
47
0
25 Aug 2020
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
8
17
0
22 Aug 2020
Textual Description for Mathematical Equations
Ajoy Mondal
C. V. Jawahar
9
2
0
07 Aug 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
9
3
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
13
101
0
28 Jul 2020
SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning
C. Sur
4
7
0
25 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAG
AI4CE
15
51
0
22 Jun 2020
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020
Teng Wang
Huicheng Zheng
Mingjing Yu
14
9
0
21 Jun 2020
iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks
Aman Chadha
John Britto
M. Mani Roja
SupR
6
25
0
13 Jun 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
9
126
0
17 May 2020
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
29
2
0
16 May 2020
Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs
Amir Rasouli
Iuliia Kotseruba
John K. Tsotsos
34
109
0
13 May 2020
Compressing Recurrent Neural Networks Using Hierarchical Tucker Tensor Decomposition
Miao Yin
Siyu Liao
Xiao-Yang Liu
Xiaodong Wang
Bo Yuan
24
24
0
09 May 2020
Text Synopsis Generation for Egocentric Videos
Aidean Sharghi
N. Lobo
M. Shah
DiffM
EgoV
6
1
0
08 May 2020
Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports
Baoyu Jing
Zeya Wang
Eric P. Xing
6
139
0
26 Apr 2020
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
18
0
0
02 Apr 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
9
164
0
17 Mar 2020
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement
Fangyi Zhu
Jenq-Neng Hwang
Zhanyu Ma
Guang Chen
Jun Guo
9
1
0
08 Mar 2020
Hierarchical Memory Decoding for Video Captioning
Aming Wu
Yahong Han
6
2
0
27 Feb 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
16
271
0
26 Feb 2020
Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge
Hung Le
Nancy F. Chen
12
9
0
25 Feb 2020
Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
Yunjae Jung
Dahun Kim
Sanghyun Woo
Kyungsu Kim
Sungjin Kim
In So Kweon
DiffM
6
31
0
03 Feb 2020
Convolutional Hierarchical Attention Network for Query-Focused Video Summarization
Shuwen Xiao
Zhou Zhao
Zijian Zhang
Ziyu Guan
Deng Cai
13
48
0
31 Jan 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
13
19
0
17 Jan 2020
Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog
Shachi H. Kumar
Eda Okur
Saurav Sahay
Jonathan Huang
L. Nachman
6
1
0
20 Dec 2019
Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog
Shachi H. Kumar
Eda Okur
Saurav Sahay
Jonathan Huang
L. Nachman
6
7
0
20 Dec 2019
Extending Machine Language Models toward Human-Level Language Understanding
James L. McClelland
Felix Hill
Maja R. Rudolph
Jason Baldridge
Hinrich Schütze
LRM
23
34
0
12 Dec 2019
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting
Yan Bin Ng
Basura Fernando
AI4TS
9
34
0
10 Dec 2019
Assessing the Robustness of Visual Question Answering Models
Jia-Hong Huang
Modar Alfadly
Bernard Ghanem
M. Worring
AAML
OOD
15
23
0
30 Nov 2019
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models
Menatallh Hammad
May Hammad
Mohamed Elshenawy
17
2
0
22 Nov 2019
Empirical Autopsy of Deep Video Captioning Frameworks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Ajmal Saeed Mian
13
6
0
21 Nov 2019
A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving
M. Crouse
Ibrahim Abdelaziz
B. Makni
Spencer Whitehead
Cristina Cornelio
Pavan Kapanipathi
Kavitha Srinivas
Veronika Thost
Michael Witbrock
Achille Fokoue
LRM
17
10
0
05 Nov 2019
Diverse Video Captioning Through Latent Variable Expansion
Huanhou Xiao
Jinglun Shi
DiffM
16
15
0
26 Oct 2019
Human Action Sequence Classification
Yan Bin Ng
Basura Fernando
20
4
0
07 Oct 2019
Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction
Xiangbo Shu
Liyan Zhang
Guo-Jun Qi
Wei Liu
Jinhui Tang
3DH
HAI
37
203
0
29 Sep 2019
Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Bo Wan
Desen Zhou
Yongfei Liu
Rongjie Li
Xuming He
16
196
0
18 Sep 2019
Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes
Jonas Kemp
A. Rajkomar
Andrew M. Dai
10
10
0
06 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
17
33
0
05 Sep 2019
Previous
1
2
3
4
5
Next