HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data

25 July 2024

Papers citing "HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data"

5 / 5 papers shown

Title
Human Gaze Boosts Object-Centered Representation Learning Timothy Schaumlöffel A. Aubret Gemma Roig Jochen Triesch 33 0 0 06 Jan 2025
Self-supervised learning of video representations from a child's perspective A. Orhan Wentao Wang Alex N. Wang Mengye Ren Brenden Lake 17 4 0 01 Feb 2024
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,412 0 11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 282 39,190 0 01 Sep 2014