Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1903.02874
Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"
50 / 267 papers shown
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Neural Information Processing Systems (NeurIPS), 2022
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
273
84
0
12 Oct 2022
Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization
Nikita Dvornik
Isma Hadji
Hai X. Pham
Dhaivat Bhatt
Brais Martínez
Afsaneh Fazly
Allan D. Jepson
234
6
0
10 Oct 2022
Turbo Training with Token Dropout
British Machine Vision Conference (BMVC), 2022
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
214
14
0
10 Oct 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Neural Information Processing Systems (NeurIPS), 2022
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
173
95
0
08 Oct 2022
Compressed Vision for Efficient Video Understanding
Asian Conference on Computer Vision (ACCV), 2022
Olivia Wiles
João Carreira
Iain Barr
Andrew Zisserman
Mateusz Malinowski
119
10
0
06 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
British Machine Vision Conference (BMVC), 2022
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
201
5
0
30 Sep 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
European Conference on Computer Vision (ECCV), 2022
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
217
42
0
14 Aug 2022
My View is the Best View: Procedure Learning from Egocentric Videos
European Conference on Computer Vision (ECCV), 2022
Siddhant Bansal
Chetan Arora
C. V. Jawahar
EgoV
184
76
0
22 Jul 2022
LocVTP: Video-Text Pre-training for Temporal Localization
European Conference on Computer Vision (ECCV), 2022
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
207
70
0
21 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
IEEE International Conference on Multimedia Big Data (ICMBD), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
238
5
0
16 Jul 2022
Self-Supervised Learning for Videos: A Survey
ACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
478
167
0
18 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
272
54
0
04 May 2022
MHMS: Multimodal Hierarchical Multimedia Summarization
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Yue Liu
Ding Zhao
Hailin Jin
173
14
0
07 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Computer Vision and Pattern Recognition (CVPR), 2022
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
231
141
0
07 Apr 2022
Long Movie Clip Classification with State-Space Video Models
European Conference on Computer Vision (ECCV), 2022
Md. Mohaiminul Islam
Gedas Bertasius
VLM
420
139
0
04 Apr 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Computer Vision and Pattern Recognition (CVPR), 2022
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
350
295
0
28 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
161
92
0
26 Mar 2022
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Reza Ghoddoosian
Isht Dwivedi
Nakul Agarwal
Chiho Choi
Behzad Dariush
161
22
0
24 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
230
43
0
22 Mar 2022
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Shuyan Zhou
Li Zhang
Yue Yang
Qing Lyu
Pengcheng Yin
Chris Callison-Burch
Graham Neubig
178
32
0
14 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Amélie Reymond
217
3
0
16 Feb 2022
Learning To Recognize Procedural Activities with Distant Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
249
96
0
26 Jan 2022
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment
International Conference on Language Resources and Evaluation (LREC), 2022
Luis Lebron
Yvette Graham
Kevin McGuinness
K. Kouramas
Noel E. O'Connor
186
4
0
25 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
327
23
0
13 Dec 2021
Learning to Align Sequential Actions in the Wild
Weizhe Liu
Bugra Tekin
Huseyin Coskun
Vibhav Vineet
Pascal Fua
Marc Pollefeys
226
32
0
17 Nov 2021
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
261
14
0
17 Nov 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Reza Ghoddoosian
S. Sayed
V. Athitsos
167
16
0
12 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
357
84
0
30 Sep 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
805
689
0
28 Sep 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
205
49
0
21 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
144
5
0
16 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Neural Information Processing Systems (NeurIPS), 2021
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
159
62
0
26 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
IEEE International Conference on Computer Vision (ICCV), 2021
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
220
154
0
23 Aug 2021
Group-aware Contrastive Regression for Action Quality Assessment
Xumin Yu
Yongming Rao
Wenliang Zhao
Jiwen Lu
Jie Zhou
AI4TS
172
134
0
17 Aug 2021
Unsupervised Discovery of Actions in Instructional Videos
British Machine Vision Conference (BMVC), 2021
A. Piergiovanni
A. Angelova
Michael S. Ryoo
Irfan Essa
170
4
0
28 Jun 2021
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection
Mahsa Ehsanpour
F. Saleh
Silvio Savarese
Ian Reid
Hamid Rezatofighi
236
59
0
16 Jun 2021
Transferring Knowledge from Text to Video: Zero-Shot Anticipation for Procedural Actions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fadime Sener
Rishabh Saraf
Angela Yao
LM&Ro
183
17
0
06 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Findings (Findings), 2021
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
327
146
0
20 May 2021
Home Action Genome: Cooperative Compositional Action Understanding
Computer Vision and Pattern Recognition (CVPR), 2021
Nishant Rai
Haofeng Chen
Jingwei Ji
Rishi Desai
Kazuki Kozuka
Shun Ishizaka
Ehsan Adeli
Juan Carlos Niebles
115
86
0
11 May 2021
Visual Goal-Step Inference using wikiHow
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Yue Yang
Artemis Panagopoulou
Qing Lyu
Li Zhang
Mark Yatskar
Chris Callison-Burch
251
50
0
12 Apr 2021
Visual Semantic Role Labeling for Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2021
Arka Sadhu
Tanmay Gupta
Mark Yatskar
Ram Nevatia
Aniruddha Kembhavi
VLM
290
88
0
02 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
193
7
0
01 Apr 2021
Machine-Generated Hierarchical Structure of Human Activities to Reveal How Machines Think
IEEE Access (IEEE Access), 2021
Mahsun Altin
Furkan Gursoy
Lina Xu
HAI
AI4CE
87
3
0
19 Jan 2021
Look Before you Speak: Visually Contextualized Utterances
Computer Vision and Pattern Recognition (CVPR), 2020
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
311
71
0
10 Dec 2020
ActBERT: Learning Global-Local Video-Text Representations
Computer Vision and Pattern Recognition (CVPR), 2020
Linchao Zhu
Yi Yang
ViT
324
451
0
14 Nov 2020
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Rui Dai
Srijan Das
Saurav Sharma
Luca Minciullo
Lorenzo Garattoni
Francois Bremond
Gianpiero Francesca
258
64
0
28 Oct 2020
Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization
Tao Zhao
Junwei Han
Le Yang
Dingwen Zhang
174
20
0
18 Aug 2020
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities
European Conference on Computer Vision (ECCV), 2020
Baoxiong Jia
Yixin Chen
Siyuan Huang
Yixin Zhu
Song-Chun Zhu
146
64
0
31 Jul 2020
Weakly Supervised Temporal Action Localization with Segment-Level Labels
Xinpeng Ding
Nannan Wang
Xinbo Gao
Jie Li
Xiaoyu Wang
Tongliang Liu
140
12
0
03 Jul 2020
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
Yizhak Ben-Shabat
Xin Yu
F. Saleh
Dylan Campbell
Cristian Rodriguez-Opazo
Hongdong Li
Stephen Gould
208
143
0
01 Jul 2020
Previous
1
2
3
4
5
6
Next