Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

22 March 2022

Tomávs Souvcek

Jean-Baptiste Alayrac

Papers citing "Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos"

27 / 27 papers shown

Title
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning Chi-Hsi Kung Frangil Ramirez Juhyung Ha Yi-Ting Chen David J. Crandall Yi-Hsuan Tsai 40 0 0 27 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video Priyanka Mandikal Tushar Nagarajan Alex Stoken Zihui Xue Kristen Grauman 39 0 0 15 Mar 2025
Learning Human Skill Generators at Key-Step Levels Yilu Wu Chenhui Zhu Shuai Wang Hanlin Wang Jing Wang Zhaoxiang Zhang Limin Wang VGen 112 0 0 12 Feb 2025
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction Hyungjin Chung Dohun Lee Jong Chul Ye VGen DiffM 16 2 0 07 Oct 2024
Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL Naoaki Kanazawa Kento Kawaharazuka Yoshiki Obinata Kei Okada Masayuki Inaba LM&Ro 16 0 0 03 Oct 2024
Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman Shijie Wang Yuan Zang David Heffren Chen Sun CoGe 19 1 0 16 Sep 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective Rui Qian Shuangrui Ding Dahua Lin OCL 41 1 0 09 Jul 2024
Active Object Detection with Knowledge Aggregation and Distillation from Large Models Dejie Yang Yang Liu 32 3 0 21 May 2024
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos Kumaranage Ravindu Yasas Nagasinghe Honglu Zhou Malitha Gunawardhana Martin Renqiang Min Daniel Harari Muhammad Haris Khan 30 2 0 05 Mar 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos Yulei Niu Wenliang Guo Long Chen Xudong Lin Shih-Fu Chang 39 6 0 03 Mar 2024
OSCaR: Object State Captioning and State Change Representation Nguyen Nguyen Jing Bi A. Vosoughi Yapeng Tian Pooyan Fazli Chenliang Xu 40 8 0 27 Feb 2024
Learning to Visually Connect Actions and their Effects Eric Peh Paritosh Parmar Basura Fernando 22 2 0 19 Jan 2024
Learning Object State Changes in Videos: An Open-World Perspective Zihui Xue Kumar Ashutosh Kristen Grauman VGen 17 18 0 19 Dec 2023
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos Tomávs Souvcek Dima Damen Michael Wray Ivan Laptev Josef Sivic VGen 15 19 0 12 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains Rohan Myer Krishnan Zitian Tang Zhiqiu Yu Chen Sun 33 1 0 30 Nov 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos Takehiko Ohkawa Takuma Yagi Taichi Nishimura Ryosuke Furuta Atsushi Hashimoto Yoshitaka Ushiku Yoichi Sato EgoV 23 7 0 28 Nov 2023
Chop & Learn: Recognizing and Generating Object-State Compositions Nirat Saini Hanyu Wang Archana Swaminathan Vinoj Jayasundara Bo He Kamal Gupta Abhinav Shrivastava CoGe 23 12 0 25 Sep 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 37 48 0 31 Jul 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective Thanh-Dat Truong Khoa Luu EgoV 27 9 0 25 May 2023
Visual Transformation Telling Wanqing Cui Mustafa Nasir-Moin Yanyan Lan Viola J. Chen J. Guo Xueqi Cheng LRM 51 1 0 03 May 2023
Procedure-Aware Pretraining for Instructional Video Understanding Honglu Zhou Roberto Martín-Martín Mubbasir Kapadia Silvio Savarese Juan Carlos Niebles 17 38 0 31 Mar 2023
Multi-Task Learning of Object State Changes from Uncurated Videos Tomávs Souvcek Jean-Baptiste Alayrac Antoine Miech Ivan Laptev Josef Sivic 23 11 0 24 Nov 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 218 682 0 13 Oct 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 278 1,939 0 09 Feb 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 20 14 0 07 Jan 2021
Transformation-based Adversarial Video Prediction on Large-Scale Data Pauline Luc Aidan Clark Sander Dieleman Diego de Las Casas Yotam Doron Albin Cassirer Karen Simonyan VGen 212 86 0 09 Mar 2020
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 58 583 0 18 Dec 2012