COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019

Jie Zhou

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

50 / 267 papers shown

E.T. Bench: Towards Open-Ended Event-Level Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024

Ye Liu

Zongyang Ma

Chen Ma

Yang Wu

Ying Shan

Chang Wen Chen

273

26 Sep 2024

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video AlignmentEuropean Conference on Computer Vision (ECCV), 2024

Yu Kong

Martin Renqiang Min

Dimitris N. Metaxas

DiffM

297

22 Sep 2024

Enhancing Long Video Understanding via Hierarchical Event-Based Memory

Jingyu Liu

Xi Chen

285

10 Sep 2024

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

429

30 Aug 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision ComputationNeural Information Processing Systems (NeurIPS), 2024

Shiwei Wu

Joya Chen

Kevin Qinghong Lin

Enhong Chen

Mike Zheng Shou

VLM

249

29 Aug 2024

Diffusion Model for Planning: A Systematic Literature Review

288

16 Aug 2024

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative SemanticsEuropean Conference on Computer Vision (ECCV), 2024

200

05 Aug 2024

COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkEuropean Conference on Computer Vision (ECCV), 2024

279

05 Aug 2024

ExpertAF: Expert Actionable Feedback from VideoComputer Vision and Pattern Recognition (CVPR), 2024

457

01 Aug 2024

Temporally Grounding Instructional Diagrams in Unconstrained Videos

Yizhak Ben-Shabat

294

16 Jul 2024

Open-Event Procedure Planning in Instructional Videos

Yilu Wu

Hanlin Wang

Jing Wang

Limin Wang

275

06 Jul 2024

Tarsier: Recipes for Training and Evaluating Large Video Description Models

Jiawei Wang

Liping Yuan

Yuchen Zhang

306

115

30 Jun 2024

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

Zekun Wang

Bing Qin

184

26 Jun 2024

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Yuxuan Wang

Yueqian Wang

Dongyan Zhao

Cihang Xie

Zilong Zheng

MLLM VLM

268

24 Jun 2024

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

472

20 Jun 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen

Kevin Qinghong Lin

Difei Gao

Mike Zheng Shou

314

109

17 Jun 2024

A Survey of Video Datasets for Grounded Event Understanding

Kate Sanders

Benjamin Van Durme

247

14 Jun 2024

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

331

13 Jun 2024

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

596

09 Jun 2024

Step Differences in Instructional Video

Tushar Nagarajan

Lorenzo Torresani

VGen

432

24 Apr 2024

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Badri N. Patro

Vijay Srinivas Agneeswaran

Mamba

368

24 Apr 2024

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Ser-Nam Lim

362

184

08 Apr 2024

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment

Jie Zhou

226

07 Apr 2024

PREGO: online mistake detection in PRocedural EGOcentric videosComputer Vision and Pattern Recognition (CVPR), 2024

Alessandro Flaborea

Guido Maria DÁmely di Melendugno

297

02 Apr 2024

LITA: Language Instructed Temporal-Localization Assistant

De-An Huang

Shijia Liao

Subhashree Radhakrishnan

241

104

27 Mar 2024

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

222

27 Mar 2024

ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Lei Shi

Paul-Christian Bürkner

Andreas Bulling

DiffM VGen

238

13 Mar 2024

VideoMamba: State Space Model for Efficient Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2024

Yu Qiao

286

398

11 Mar 2024

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Kumaranage Ravindu Yasas Nagasinghe

Honglu Zhou

Malitha Gunawardhana

Martin Renqiang Min

Daniel Harari

Muhammad Haris Khan

256

05 Mar 2024

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

285

03 Mar 2024

CI w/o TN: Context Injection without Task Name for Procedure Planning

Xinjie Li

209

23 Feb 2024

Video ReCap: Recursive Captioning of Hour-Long Videos

Gedas Bertasius

670

20 Feb 2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

...

391

20 Feb 2024

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

Yifei Huang

Yoichi Sato

235

01 Feb 2024

Multi-granularity Correspondence Learning from Long-term Noisy Videos

355

30 Jan 2024

Zero Shot Open-ended Video Inference

146

23 Jan 2024

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

305

22 Jan 2024

Learning to Visually Connect Actions and their Effects

Eric Peh

Paritosh Parmar

Basura Fernando

424

19 Jan 2024

Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024

491

03 Jan 2024

CaptainCook4D: A dataset for understanding errors in procedural activities

Rohith Peddi

Shivvrat Arya

B. Challa

Likhitha Pallapothula

Akshay Vyas

...

Vasundhara Komaragiri

269

22 Dec 2023

Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain

Hsiu-yu Yang

Carina Silberer

162

18 Dec 2023

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

313

18 Dec 2023

GenHowTo: Learning to Generate Actions and State Transformations from Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2023

Dima Damen

257

12 Dec 2023

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

Mingyu Ding

Ying Shan

371

11 Dec 2023

Generating Illustrated Instructions

286

07 Dec 2023

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023

Shicheng Li

372

356

04 Dec 2023

Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains

507

30 Nov 2023

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition

331

28 Nov 2023

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Yoichi Sato

299

28 Nov 2023

Efficient Pre-training for Localized Instruction Generation of Videos

397

27 Nov 2023