Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1703.02521
Cited By
v1
v2 (latest)
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
7 March 2017
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos"
23 / 23 papers shown
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
European Conference on Computer Vision (ECCV), 2024
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
297
3
0
05 Aug 2024
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Yasas Nagasinghe
Honglu Zhou
Malitha Gunawardhana
Martin Renqiang Min
Daniel Harari
Muhammad Haris Khan
304
17
0
05 Mar 2024
Reconstructing and grounding narrated instructional videos in 3D
Dimitri Zhukov
Ignacio Rocco
Ivan Laptev
Josef Sivic
Johannes L. Schnberger
Bugra Tekin
Marc Pollefeys
113
0
0
09 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
Neural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
497
437
0
04 Jun 2021
Video Question Answering on Screencast Tutorials
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Wentian Zhao
Seokhwan Kim
N. Xu
Hailin Jin
133
10
0
02 Aug 2020
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Neural Information Processing Systems (NeurIPS), 2020
Zhiwei Deng
Karthik Narasimhan
Olga Russakovsky
246
104
0
11 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
275
142
0
16 Jun 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
Frank F. Xu
Lei Ji
Ding Wang
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
150
22
0
02 May 2020
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jack Hessel
Zhenhai Zhu
Bo Pang
Radu Soricut
231
4
0
29 Apr 2020
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Yansong Tang
Jiwen Lu
Jie Zhou
217
42
0
20 Mar 2020
Action Modifiers: Learning from Adverbs in Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2019
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
350
40
0
13 Dec 2019
A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions
Conference on Computational Natural Language Learning (CoNLL), 2019
Jack Hessel
Bo Pang
Zhenhai Zhu
Radu Soricut
184
39
0
07 Oct 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
IEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
693
1,412
0
07 Jun 2019
Cross-task weakly supervised learning from instructional videos
Computer Vision and Pattern Recognition (CVPR), 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
549
298
0
19 Mar 2019
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
381
404
0
07 Mar 2019
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos
Shaojie Wang
Wentian Zhao
Ziyi Kou
Chenliang Xu
149
5
0
02 Dec 2018
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Meera Hahn
Nataniel Ruiz
Jean-Baptiste Alayrac
Ivan Laptev
James M. Rehg
125
6
0
22 Sep 2018
Localizing Moments in Video with Temporal Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
243
174
0
05 Sep 2018
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang
Suraj Nair
Danfei Xu
Yuke Zhu
Animesh Garg
Li Fei-Fei
Silvio Savarese
Juan Carlos Niebles
204
151
0
10 Jul 2018
Reward Learning from Narrated Demonstrations
H. Tung
Adam W. Harley
Liang-Kang Huang
Katerina Fragkiadaki
LM&Ro
SSL
219
31
0
27 Apr 2018
Automatically Extracting Action Graphs from Materials Science Synthesis Procedures
Sheshera Mysore
Edward J. Kim
Emma Strubell
Ao Liu
Haw-Shiuan Chang
Srikrishna Kompella
Kevin Huang
Andrew McCallum
E. Olivetti
189
39
0
18 Nov 2017
Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo
Andreas M. Lehrmann
Bohyung Han
Leonid Sigal
283
125
0
23 Sep 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
441
1,143
0
04 Aug 2017
1
Page 1 of 1