ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09133
  4. Cited By
Masked Feature Prediction for Self-Supervised Visual Pre-Training
v1v2 (latest)

Masked Feature Prediction for Self-Supervised Visual Pre-Training

16 December 2021
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
    ViT
ArXiv (abs)PDFHTML

Papers citing "Masked Feature Prediction for Self-Supervised Visual Pre-Training"

50 / 494 papers shown
Title
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Chenting Wang
Yuhan Zhu
Yicheng Xu
Jiange Yang
Ziang Yan
Yali Wang
Yi Wang
Limin Wang
VGen
85
0
0
01 Dec 2025
PowerCLIP: Powerset Alignment for Contrastive Pre-Training
PowerCLIP: Powerset Alignment for Contrastive Pre-Training
Masaki Kawamura
Nakamasa Inoue
Rintaro Yanagi
Hirokatsu Kataoka
Rio Yokota
CLIPVLM
89
0
0
28 Nov 2025
Rethinking Cross-Generator Image Forgery Detection through DINOv3
Rethinking Cross-Generator Image Forgery Detection through DINOv3
Zhenglin Huang
Jason Li
Haiquan Wen
Tianxiao Li
Xi Yang
Lu Qi
Bei Peng
Xiaowei Huang
Ming-Hsuan Yang
Guangliang Cheng
16
0
0
27 Nov 2025
MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning
MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning
Jingshan Hong
Haigen Hu
Huihuang Zhang
Q. Zhou
Zhao Li
64
0
0
16 Nov 2025
Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning
Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning
Raneen Younis
Louay Hamdi
Lukas Chavez
Zahra Ahmadi
MedIm
175
0
0
10 Nov 2025
MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model
MiVID: Multi-Strategic Self-Supervision for Video Frame Interpolation using Diffusion Model
Priyansh Srivastava
Romit Chatterjee
A. Sen
Aradhana Behura
Ratnakar Dash
DiffMVGen
103
0
0
08 Nov 2025
ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
Srikumar Sastry
Subash Khanal
Aayush Dhakal
Jiayu Lin
Dan Cher
Phoenix Jarosz
Nathan Jacobs
116
0
0
04 Nov 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGenSyDa
164
2
0
23 Oct 2025
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ziyuan Huang
Dandan Zheng
Cheng Zou
Rui Liu
Xiaolong Wang
...
Jiajia Liu
Qingpei Guo
Ming-Hsuan Yang
Jingdong Chen
Jun Zhou
136
8
0
08 Oct 2025
Conditional Representation Learning for Customized Tasks
Conditional Representation Learning for Customized Tasks
Honglin Liu
Chao Sun
Peng Hu
Yunfan Li
Xi Peng
141
0
0
06 Oct 2025
UniVid: The Open-Source Unified Video Model
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
234
6
0
29 Sep 2025
UNIV: Unified Foundation Model for Infrared and Visible Modalities
UNIV: Unified Foundation Model for Infrared and Visible Modalities
Fangyuan Mao
Shuo Wang
Jilin Mei
Chen Min
Shun Lu
Fuyang Liu
Xiaokun Feng
Meiqi Wu
Yu Hu
64
0
0
19 Sep 2025
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Zinan Lin
Enshu Liu
Xuefei Ning
Junyi Zhu
Wenyu Wang
Sergey Yekhanin
AI4CE
213
0
0
19 Sep 2025
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
Xiaoqi Zhao
Youwei Pang
Chenyang Yu
Lihe Zhang
Huchuan Lu
Shijian Lu
Georges El Fakhri
Xiaofeng Liu
110
2
0
19 Sep 2025
Masked Feature Modeling Enhances Adaptive Segmentation
Masked Feature Modeling Enhances Adaptive Segmentation
Wenlve Zhou
Zhiheng Zhou
Tiantao Xian
Yikui Zhai
Weibin Wu
Biyun Ma
96
0
0
17 Sep 2025
Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models
Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language ModelsIEEE journal of biomedical and health informatics (JBHI), 2025
Qiuhui Chen
Xuancheng Yao
Huping Ye
Yi Hong
MedIm
108
1
0
11 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
213
0
0
11 Sep 2025
Diffusion-Based Action Recognition Generalizes to Untrained Domains
Diffusion-Based Action Recognition Generalizes to Untrained Domains
Rogério Guimarães
Frank Xiao
Pietro Perona
Markus Marks
241
0
0
10 Sep 2025
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco
Rahul Ramesh
Randall Balestriero
Pratik Chaudhari
110
0
0
21 Aug 2025
Self-Supervised Sparse Sensor Fusion for Long Range Perception
Self-Supervised Sparse Sensor Fusion for Long Range Perception
Edoardo Palladin
Samuel Brucker
Filippo Ghilotti
Praveen Narayanan
Mario Bijelic
Felix Heide
SSL
113
0
0
19 Aug 2025
S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision
S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision
Huihui Xu
Jin Ye
Hongqiu Wang
Changkai Ji
Jiashi Lin
...
Chenglong Ma
Tianbin Li
Lihao Liu
Junjun He
Lei Zhu
158
0
0
09 Aug 2025
MINR: Implicit Neural Representations with Masked Image Modelling
MINR: Implicit Neural Representations with Masked Image Modelling
Sua Lee
Joonhun Lee
Myungjoo Kang
119
1
0
30 Jul 2025
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi
Ziyi Wu
Igor Gilitschenski
ViT
112
0
0
29 Jul 2025
Self-Guided Masked Autoencoder
Self-Guided Masked AutoencoderNeural Information Processing Systems (NeurIPS), 2025
Jeongwoo Shin
Inseo Lee
Junho Lee
Joonseok Lee
SSL
129
9
0
26 Jul 2025
Video Self-Distillation for Single-Image Encoders: A Step Toward Physically Plausible Perception
Video Self-Distillation for Single-Image Encoders: A Step Toward Physically Plausible Perception
Marcel Simon
Tae-Ho Kim
Seul-Ki Yeom
VGenMDE
111
0
0
25 Jul 2025
Improving Joint Embedding Predictive Architecture with Diffusion Noise
Improving Joint Embedding Predictive Architecture with Diffusion Noise
Yuping Qiu
Rui Zhu
Ying-cong Chen
DiffM
159
0
0
21 Jul 2025
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Shashanka Venkataramanan
Valentinos Pariza
Mohammadreza Salehi
Lukas Knobel
Spyros Gidaris
Elias Ramzi
Andrei Bursuc
Yuki M. Asano
195
7
0
18 Jul 2025
HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space
HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space
Changli Wang
Fang Yin
Jiafeng Liu
Rui Wu
159
0
0
13 Jul 2025
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić
Christoph Reich
Felix Wimbauer
Oliver Hahn
Christian Rupprecht
Stefan Roth
Daniel Cremers
260
2
0
08 Jul 2025
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
Bill Psomas
Dionysis Christopoulos
Eirini Baltzi
Ioannis Kakogeorgiou
Tilemachos Aravanis
N. Komodakis
Konstantinos Karantzalos
Yannis Avrithis
Giorgos Tolias
265
1
0
11 Jun 2025
MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking
MaskAdapt: Unsupervised Geometry-Aware Domain Adaptation Using Multimodal Contextual Learning and RGB-Depth Masking
Numair Nadeem
Muhammad Asad
Saeed Anwar
Abdul Bais
153
2
0
29 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
306
0
0
13 May 2025
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Zhengmi Tang
Yuto Mitsui
Tomo Miyazaki
S. Omachi
252
0
0
11 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
588
96
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
Liwen Wang
ZhiPeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
Jianmin Ji
Y. Zhang
3DPC
263
0
0
17 Apr 2025
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Zhi Zuo
Chenyi Zhuang
Zhiqiang Shen
Pan Gao
Jie Qin
Nicu Sebe
3DPC
301
1
0
07 Apr 2025
Towards Generalizing Temporal Action Segmentation to Unseen Views
Towards Generalizing Temporal Action Segmentation to Unseen Views
Emad Bahrami
Olga Zatsarynna
Gianpiero Francesca
Juergen Gall
EgoV
200
0
0
03 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Rundong Wang
339
28
0
02 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIPVLM
407
37
0
01 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
273
2
0
30 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
954
0
0
26 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2025
Yifei Zhang
Yu Xie
Jin Wei
Xiaomeng Yang
Yu Zhou
Can Ma
Xiangyang Ji
260
7
0
24 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
250
0
0
20 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
Hongru Wang
Longji Xu
Haiyan Zhao
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
451
10
0
13 Mar 2025
Towards All-in-One Medical Image Re-IdentificationComputer Vision and Pattern Recognition (CVPR), 2025
Yuan Tian
Kaiyuan Ji
Rongzhao Zhang
Yankai Jiang
Chunyi Li
Xiaosong Wang
Guoquan Zheng
VLM
211
3
0
11 Mar 2025
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation
Guiwei Zhang
Tianyu Zhang
Mohan Zhou
Yalong Bai
Biye Li
244
5
0
10 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
237
3
0
09 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Sihao Lin
Chunwei Wang
Xiuwei Chen
Hongbin Xu
Jiawei Han
Xiandan Liang
J. N. Han
Hang Xu
Xiaodan Liang
VLM
659
14
0
09 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
486
14
0
08 Mar 2025
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual RepresentationAAAI Conference on Artificial Intelligence (AAAI), 2025
Wenzhao Xiang
Chang Liu
Hongyang Yu
Xilin Chen
201
1
0
02 Mar 2025
1234...8910
Next