v1v2 (latest)

Masked Feature Prediction for Self-Supervised Visual Pre-Training

16 December 2021

Christoph Feichtenhofer

ViT

ArXiv (abs)PDF HTML

Papers citing "Masked Feature Prediction for Self-Supervised Visual Pre-Training"

50 / 498 papers shown

DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance

04 Dec 2025

Enhancing next token prediction based pre-training for jet foundation models

03 Dec 2025

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

170

01 Dec 2025

PowerCLIP: Powerset Alignment for Contrastive Pre-Training

200

28 Nov 2025

Rethinking Cross-Generator Image Forgery Detection through DINOv3

27 Nov 2025

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

121

16 Nov 2025

Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

206

10 Nov 2025

MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model

151

08 Nov 2025

ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology

144

04 Nov 2025

From Masks to Worlds: A Hitchhiker's Guide to World Models

185

23 Oct 2025

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

...

158

08 Oct 2025

Conditional Representation Learning for Customized Tasks

156

06 Oct 2025

UniVid: The Open-Source Unified Video Model

283

29 Sep 2025

UNIV: Unified Foundation Model for Infrared and Visible Modalities

111

19 Sep 2025

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

241

19 Sep 2025

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation

137

19 Sep 2025

Masked Feature Modeling Enhances Adaptive Segmentation

124

17 Sep 2025

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language ModelsIEEE journal of biomedical and health informatics (JBHI), 2025

141

11 Sep 2025

Video Understanding by Design: How Datasets Shape Architectures and Insights

238

11 Sep 2025

Diffusion-Based Action Recognition Generalizes to Untrained Domains

270

10 Sep 2025

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations

123

21 Aug 2025

Self-Supervised Sparse Sensor Fusion for Long Range Perception

145

19 Aug 2025

S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision

...

200

09 Aug 2025

MINR: Implicit Neural Representations with Masked Image Modelling

Sua Lee

Joonhun Lee

Myungjoo Kang

144

30 Jul 2025

TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras

152

29 Jul 2025

Self-Guided Masked AutoencoderNeural Information Processing Systems (NeurIPS), 2025

166

26 Jul 2025

Video Self-Distillation for Single-Image Encoders: A Step Toward Physically Plausible Perception

159

25 Jul 2025

Improving Joint Embedding Predictive Architecture with Diffusion Noise

204

21 Jul 2025

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Shashanka Venkataramanan

237

18 Jul 2025

HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space

188

13 Jul 2025

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

305

08 Jul 2025

Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency

Bill Psomas

Dionysis Christopoulos

Konstantinos Karantzalos

Yannis Avrithis

Giorgos Tolias

333

11 Jun 2025

MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking

199

29 May 2025

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

350

13 May 2025

Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies

301

11 May 2025

Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving

295

17 Apr 2025

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya

Po-Yao (Bernie) Huang

...

Christoph Feichtenhofer

ObjD VOS

670

107

17 Apr 2025

Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos

344

07 Apr 2025

Towards Generalizing Temporal Action Segmentation to Unseen Views

229

03 Apr 2025

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

372

02 Apr 2025

Scaling Language-Free Visual Representation Learning

...

445

01 Apr 2025

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei

Rama Chellappa

301

30 Mar 2025

Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos

1.0K

26 Mar 2025

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2025

300

24 Mar 2025

Structured-Noise Masked Modeling for Video, Audio and Beyond

317

20 Mar 2025

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

...

544

13 Mar 2025

Towards All-in-One Medical Image Re-IdentificationComputer Vision and Pattern Recognition (CVPR), 2025

261

11 Mar 2025

V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation

269

10 Mar 2025

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

711

09 Mar 2025

Small Vision-Language Models: A Survey on Compact Architectures and Techniques

Nitesh Patnaik

Navdeep Nayak

Himani Bansal Agrawal

Moinak Chinmoy Khamaru

Gourav Bal

Saishree Smaranika Panda

268

09 Mar 2025