Learning Object State Changes in Videos: An Open-World Perspective

Learning Object State Changes in Videos: An Open-World Perspective

19 December 2023

Kristen Grauman

Papers citing "Learning Object State Changes in Videos: An Open-World Perspective"

13 / 13 papers shown

Title
Improving Physical Object State Representation in Text-to-Image Generative Systems Tianle Chen Chaitanya Chakka Deepti Ghadiyaram 27 0 0 04 May 2025
Progress-Aware Video Frame Captioning Zihui Xue Joungbin An Xitong Yang Kristen Grauman 98 1 0 03 Dec 2024
RMem: Restricted Memory Banks Improve Video Object Segmentation Junbao Zhou Ziqi Pang Yu-xiong Wang VOS 55 7 0 12 Jun 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 244 4,186 0 30 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory Santhosh Kumar Ramakrishnan Ziad Al-Halah Kristen Grauman 77 39 0 02 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 385 4,010 0 28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,017 0 13 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning Jing Bi Jiebo Luo Chenliang Xu 61 48 0 05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 245 554 0 28 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition Mengmeng Wang Jiazheng Xing Yong Liu VLM 149 360 0 17 Sep 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Xiuye Gu Tsung-Yi Lin Weicheng Kuo Yin Cui VLM ObjD 223 897 0 28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 293 3,683 0 11 Feb 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 22 14 0 07 Jan 2021