Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1903.02874
Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"
17 / 267 papers shown
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
251
142
0
16 Jun 2020
Uncertainty-aware Score Distribution Learning for Action Quality Assessment
Computer Vision and Pattern Recognition (CVPR), 2020
Yansong Tang
Zanlin Ni
Jiahuan Zhou
Danyang Zhang
Jiwen Lu
Ying Nian Wu
Jie Zhou
EDL
315
164
0
13 Jun 2020
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
130
84
0
20 May 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela S. Lin
Sudha Rao
Asli Celikyilmaz
E. Nouri
Chris Brockett
Debadeepta Dey
Bill Dolan
147
28
0
19 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
389
110
0
08 May 2020
Learning to Segment Actions from Observation and Narration
Daniel Fried
Jean-Baptiste Alayrac
Phil Blunsom
Chris Dyer
S. Clark
Aida Nematzadeh
274
41
0
07 May 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
Frank F. Xu
Lei Ji
Ding Wang
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
132
22
0
02 May 2020
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jack Hessel
Zhenhai Zhu
Bo Pang
Radu Soricut
225
4
0
29 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2020
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
166
59
0
30 Mar 2020
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Yansong Tang
Jiwen Lu
Jie Zhou
186
42
0
20 Mar 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Ding Wang
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
444
416
0
15 Feb 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
626
756
0
13 Dec 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Journal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
416
143
0
22 Jul 2019
Procedure Planning in Instructional Videos
European Conference on Computer Vision (ECCV), 2019
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
269
115
0
02 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
IEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
540
1,370
0
07 Jun 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
339
1,359
0
03 Apr 2019
Human Action Recognition and Prediction: A Survey
International Journal of Computer Vision (IJCV), 2018
Yu Kong
Y. Fu
413
741
0
28 Jun 2018
Previous
1
2
3
4
5
6
Page 6 of 6