Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

Computer Vision and Pattern Recognition (CVPR), 2023

31 March 2023

ArXiv (abs)PDF HTML Github (53★)

Papers citing "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"

33 / 33 papers shown

Learning Skill-Attributes for Transferable Assessment in Video

Kumar Ashutosh

Kristen Grauman

229

17 Nov 2025

Learning to Recognize Correctly Completed Procedure Steps in Egocentric Assembly Videos through Spatio-Temporal ModelingComputer Vision and Image Understanding (CVIU), 2025

177

14 Oct 2025

LEGO Co-builder: Exploring Fine-Grained Vision-Language Modeling for Multimodal LEGO Assembly Assistants

269

07 Jul 2025

Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025

...

320

11 Jun 2025

EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

384

30 May 2025

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

Simone Alberto Peirone

Francesca Pistilli

Giuseppe Averta

452

19 May 2025

Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding

...

302

10 Apr 2025

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

761

27 Mar 2025

Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering

534

19 Mar 2025

Stitch-a-Demo: Video Demonstrations from Multistep Descriptions

364

18 Mar 2025

VLog: Video-Language Models by Generative Retrieval of Narration VocabularyComputer Vision and Pattern Recognition (CVPR), 2025

Kevin Qinghong Lin

Mike Zheng Shou

VGen

1.1K

12 Mar 2025

Learning to Generate Long-term Future Narrations Describing Activities of Daily Living

Ramanathan Rajendiran

Debaditya Roy

Basura Fernando

VGen

369

03 Mar 2025

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

Luigi Seminara

G. Farinella

Antonino Furnari

323

25 Feb 2025

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosNeural Information Processing Systems (NeurIPS), 2024

Luigi Seminara

G. Farinella

Antonino Furnari

654

10 Jan 2025

ACE: Action Concept Enhancement of Video-Language Models in Procedural VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

317

23 Nov 2024

TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos

Leonardo Plini

Luca Scofano

Edoardo De Matteis

Guido Maria DÁmely di Melendugno

452

04 Nov 2024

Human Action Anticipation: A Survey

404

17 Oct 2024

Enhancing Temporal Modeling of Video LLMs via Time GatingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Liwei Wang

231

08 Oct 2024

VEDIT: Latent Prediction Architecture For Procedural Video Representation LearningInternational Conference on Learning Representations (ICLR), 2024

362

04 Oct 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision ComputationNeural Information Processing Systems (NeurIPS), 2024

Shiwei Wu

Joya Chen

Kevin Qinghong Lin

Enhong Chen

Mike Zheng Shou

VLM

313

29 Aug 2024

ExpertAF: Expert Actionable Feedback from VideoComputer Vision and Pattern Recognition (CVPR), 2024

511

01 Aug 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen

Kevin Qinghong Lin

Difei Gao

Mike Zheng Shou

348

155

17 Jun 2024

Learning Object States from Actions via Large Language Models

Masatoshi Tateno

Takuma Yagi

Ryosuke Furuta

Yoichi Sato

166

02 May 2024

PREGO: online mistake detection in PRocedural EGOcentric videosComputer Vision and Pattern Recognition (CVPR), 2024

Alessandro Flaborea

Guido Maria DÁmely di Melendugno

346

02 Apr 2024

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Yiwu Zhong

Zi-Yuan Hu

Michael R. Lyu

Liwei Wang

292

27 Mar 2024

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Kumaranage Ravindu Yasas Nagasinghe

Honglu Zhou

Malitha Gunawardhana

Martin Renqiang Min

Daniel Harari

Muhammad Haris Khan

339

05 Mar 2024

OSCaR: Object State Captioning and State Change Representation

682

27 Feb 2024

CI w/o TN: Context Injection without Task Name for Procedure Planning

Xinjie Li

302

23 Feb 2024

Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024

645

03 Jan 2024

GenHowTo: Learning to Generate Actions and State Transformations from Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2023

Dima Damen

303

12 Dec 2023

Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos

322

14 Sep 2023

Video-Mined Task Graphs for Keystep Recognition in Instructional VideosNeural Information Processing Systems (NeurIPS), 2023

Kumar Ashutosh

Santhosh Kumar Ramakrishnan

Triantafyllos Afouras

Kristen Grauman

370

17 Jul 2023

Learning to Ground Instructional Articles in Videos through NarrationsIEEE International Conference on Computer Vision (ICCV), 2023

E. Mavroudi

Triantafyllos Afouras

Lorenzo Torresani

DiffM

303

06 Jun 2023