Is ImageNet worth 1 video? Learning strong image encoders from 1 long
unlabelled video

Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

12 October 2023

Shashanka Venkataramanan

Mamshad Nayeem Rizve

Yannis Avrithis

Papers citing "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video"

19 / 19 papers shown

Title
Direct Motion Models for Assessing Generated Videos Kelsey R. Allen Carl Doersch Guangyao Zhou Mohammed Suhail Danny Driess ... Thomas Kipf Mehdi S. M. Sajjadi Kevin P. Murphy João Carreira Sjoerd van Steenkiste EGVM DiffM VGen 71 0 0 30 Apr 2025
Learning from Streaming Video with Orthogonal Gradients Tengda Han Dilara Gokay Joseph Heyward Chuhan Zhang Daniel Zoran Viorica Patraucean João Carreira Dima Damen Andrew Zisserman 38 0 0 02 Apr 2025
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos Felix Wimbauer Weirong Chen Dominik Muhle Christian Rupprecht Daniel Cremers VGen 60 0 0 30 Mar 2025
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs Carlos Plou Cesar Borja Ruben Martinez-Cantin Ana C. Murillo 54 0 0 25 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Sreyan Ghosh Zhifeng Kong Sonal Kumar S. Sakshi Jaehyeon Kim Wei Ping Rafael Valle Dinesh Manocha Bryan Catanzaro MLLM AuLLM LRM 46 4 0 06 Mar 2025
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging Maximilian R. Rokuss Yannick Kirchhoff Seval Akbal Balint Kovacs Saikat Roy Constantin Ulrich Tassilo Wald Lukas T. Rotkopf H. Schlemmer Klaus H. Maier-Hein 3DV MedIm 37 1 0 28 Feb 2025
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 61 2 0 10 Jan 2025
Human Gaze Boosts Object-Centered Representation Learning Timothy Schaumlöffel A. Aubret Gemma Roig Jochen Triesch 28 0 0 06 Jan 2025
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations Marcin Przewiȩźlikowski Randall Balestriero Wojciech Jasiński Marek 'Smieja Bartosz Zieliñski 55 0 0 04 Dec 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark Joseph Heyward João Carreira Dima Damen Andrew Zisserman Viorica Patraucean 59 2 0 29 Nov 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models Shenghao Fu Junkai Yan Q. Yang Xihan Wei Xiaohua Xie Wei-Shi Zheng VLM 18 3 0 25 Oct 2024
Language-based Audio Moment Retrieval Hokuto Munakata Taichi Nishimura Shota Nakada Tatsuya Komatsu 25 1 0 24 Sep 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos Alex N. Wang Christopher Hoang Yuwen Xiong Yann LeCun Mengye Ren 56 0 0 20 Aug 2024
Scaling Backwards: Minimal Synthetic Pre-training? Ryo Nakamura Ryu Tadokoro Ryosuke Yamada Tim Puhlfürß Iro Laina Christian Rupprecht Walid Maalej Rio Yokota Hirokatsu Kataoka DD 14 2 0 01 Aug 2024
Learning from One Continuous Video Stream João Carreira Michael King Viorica Patraucean Dilara Gokay Catalin Ionescu ... Joseph Heyward Carl Doersch Y. Aytar Dima Damen Andrew Zisserman CLL 11 0 0 01 Dec 2023
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain Francesco Ragusa Antonino Furnari G. Farinella EgoV 25 23 0 19 Sep 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 212 682 0 13 Oct 2021
Localizing Objects with Self-Supervised Transformers and no Labels Oriane Siméoni Gilles Puy Huy V. Vo Simon Roburin Spyros Gidaris Andrei Bursuc P. Pérez Renaud Marlet Jean Ponce ViT 159 195 0 29 Sep 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 283 5,723 0 29 Apr 2021