ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.18230
  4. Cited By
Procedure-Aware Pretraining for Instructional Video Understanding

Procedure-Aware Pretraining for Instructional Video Understanding

31 March 2023
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
ArXivPDFHTML

Papers citing "Procedure-Aware Pretraining for Instructional Video Understanding"

33 / 33 papers shown
Title
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
35
0
0
10 Apr 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
43
0
0
27 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
58
0
0
18 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin
Mike Zheng Shou
VGen
68
1
0
12 Mar 2025
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
Soumya Jahagirdar
Jayasree Saha
C. V. Jawahar
56
0
0
11 Mar 2025
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
72
0
0
25 Feb 2025
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Karan Samel
Nitish Sontakke
Irfan Essa
42
0
0
24 Feb 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
46
7
0
10 Jan 2025
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Learning to Localize Actions in Instructional Videos with LLM-Based
  Multi-Pathway Text-Video Alignment
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen
K. Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
26
1
0
22 Sep 2024
Box2Flow: Instance-based Action Flow Graphs from Videos
Box2Flow: Instance-based Action Flow Graphs from Videos
Jiatong Li
Kalliopi Basioti
Vladimir Pavlovic
27
0
0
30 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Bill Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
37
12
0
29 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
42
2
0
01 Aug 2024
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Jiefu Ou
Arda Uzunoglu
Benjamin Van Durme
Daniel Khashabi
LM&Ro
VGen
25
3
0
10 Jul 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
40
47
0
17 Jun 2024
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of
  Instructional Videos
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Yasas Nagasinghe
Honglu Zhou
Malitha Gunawardhana
Martin Renqiang Min
Daniel Harari
Muhammad Haris Khan
32
2
0
05 Mar 2024
CI w/o TN: Context Injection without Task Name for Procedure Planning
CI w/o TN: Context Injection without Task Name for Procedure Planning
Xinjie Li
29
0
0
23 Feb 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
13
6
0
03 Jan 2024
Implicit Affordance Acquisition via Causal Action-Effect Modeling in the
  Video Domain
Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain
Hsiu-yu Yang
Carina Silberer
11
1
0
18 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
35
1
0
30 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
12
0
0
27 Nov 2023
Masked Diffusion with Task-awareness for Procedure Planning in
  Instructional Videos
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Fen Fang
Yun Liu
Ali Koksal
Qianli Xu
Joo-Hwee Lim
VGen
DiffM
21
5
0
14 Sep 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
21
23
0
17 Jul 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
14
30
0
08 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
25
21
0
06 Jun 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou
Sha Li
Manling Li
Xudong Lin
Shih-Fu Chang
Mohit Bansal
Heng Ji
17
8
0
27 May 2023
Procedure Planning in Instructional Videos via Contextual Modeling and
  Model-based Policy Learning
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
61
48
0
05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
61
44
0
21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
1