COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019

Jie Zhou

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

17 / 267 papers shown

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

...

Antonio Torralba

251

142

16 Jun 2020

Uncertainty-aware Score Distribution Learning for Action Quality AssessmentComputer Vision and Pattern Recognition (CVPR), 2020

Jie Zhou

315

164

13 Jun 2020

Intra- and Inter-Action Understanding via Temporal Action Parsing

130

20 May 2020

A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

147

19 May 2020

Condensed Movies: Story Based Retrieval with Contextual Embeddings

389

110

08 May 2020

Learning to Segment Actions from Observation and Narration

Daniel Fried

Jean-Baptiste Alayrac

274

07 May 2020

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

Graham Neubig

132

02 May 2020

Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

225

29 Apr 2020

Speech2Action: Cross-modal Supervision for Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020

166

30 Mar 2020

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance EvaluationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

Yansong Tang

Jiwen Lu

Jie Zhou

186

20 Mar 2020

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Tianrui Li

444

416

15 Feb 2020

End-to-End Learning of Visual Representations from Uncurated Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2019

Antoine Miech

Jean-Baptiste Alayrac

626

756

13 Dec 2019

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019

416

143

22 Jul 2019

Procedure Planning in Instructional VideosEuropean Conference on Computer Vision (ECCV), 2019

De-An Huang

Li Fei-Fei

269

115

02 Jul 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsIEEE International Conference on Computer Vision (ICCV), 2019

Antoine Miech

Dimitri Zhukov

Jean-Baptiste Alayrac

540

1,370

07 Jun 2019

VideoBERT: A Joint Model for Video and Language Representation Learning

Carl Vondrick

339

1,359

03 Apr 2019

Human Action Recognition and Prediction: A SurveyInternational Journal of Computer Vision (IJCV), 2018

Yu Kong

Y. Fu

413

741

28 Jun 2018