Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.02930
Cited By
A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions
7 October 2019
Jack Hessel
Bo Pang
Zhenhai Zhu
Radu Soricut
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions"
13 / 13 papers shown
Title
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
27
47
0
22 Mar 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
Multimodal Frame-Scoring Transformer for Video Summarization
Jeiyoon Park
Kiho Kwoun
Chanhee Lee
Heuiseok Lim
ViT
30
6
0
05 Jul 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
31
207
0
07 Jan 2022
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
135
127
0
30 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
22
372
0
04 Jun 2021
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
18
81
0
10 Nov 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela S. Lin
Sudha Rao
Asli Celikyilmaz
E. Nouri
Chris Brockett
Debadeepta Dey
Bill Dolan
18
24
0
19 May 2020
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
29
87
0
10 Apr 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
164
0
17 Mar 2020
A causal framework for explaining the predictions of black-box sequence-to-sequence models
David Alvarez-Melis
Tommi Jaakkola
CML
232
201
0
06 Jul 2017
1