ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.02874
  4. Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video
  Analysis

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
ArXiv (abs)PDFHTML

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

17 / 267 papers shown
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
251
142
0
16 Jun 2020
Uncertainty-aware Score Distribution Learning for Action Quality
  Assessment
Uncertainty-aware Score Distribution Learning for Action Quality AssessmentComputer Vision and Pattern Recognition (CVPR), 2020
Yansong Tang
Zanlin Ni
Jiahuan Zhou
Danyang Zhang
Jiwen Lu
Ying Nian Wu
Jie Zhou
EDL
315
164
0
13 Jun 2020
Intra- and Inter-Action Understanding via Temporal Action Parsing
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
130
84
0
20 May 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela S. Lin
Sudha Rao
Asli Celikyilmaz
E. Nouri
Chris Brockett
Debadeepta Dey
Bill Dolan
147
28
0
19 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
389
110
0
08 May 2020
Learning to Segment Actions from Observation and Narration
Learning to Segment Actions from Observation and Narration
Daniel Fried
Jean-Baptiste Alayrac
Phil Blunsom
Chris Dyer
S. Clark
Aida Nematzadeh
274
41
0
07 May 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking
  Videos
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
Frank F. Xu
Lei Ji
Ding Wang
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
132
22
0
02 May 2020
Beyond Instructional Videos: Probing for More Diverse Visual-Textual
  Grounding on YouTube
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Jack Hessel
Zhenhai Zhu
Bo Pang
Radu Soricut
225
4
0
29 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Speech2Action: Cross-modal Supervision for Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
166
59
0
30 Mar 2020
Comprehensive Instructional Video Analysis: The COIN Dataset and
  Performance Evaluation
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance EvaluationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Yansong Tang
Jiwen Lu
Jie Zhou
186
42
0
20 Mar 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal
  Understanding and Generation
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Ding Wang
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
444
416
0
15 Feb 2020
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos
End-to-End Learning of Visual Representations from Uncurated Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGenSSL
626
756
0
13 Dec 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
416
143
0
22 Jul 2019
Procedure Planning in Instructional Videos
Procedure Planning in Instructional VideosEuropean Conference on Computer Vision (ECCV), 2019
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
269
115
0
02 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsIEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
540
1,370
0
07 Jun 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLMSSL
339
1,359
0
03 Apr 2019
Human Action Recognition and Prediction: A Survey
Human Action Recognition and Prediction: A SurveyInternational Journal of Computer Vision (IJCV), 2018
Yu Kong
Y. Fu
413
741
0
28 Jun 2018
Previous
123456
Page 6 of 6