Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.00261
Cited By
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
1 August 2023
Yuan Liu
Songyang Zhang
Jiacheng Chen
Zhaohui Yu
Kai-xiang Chen
Dahua Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Pixel-based MIM by Reducing Wasted Modeling Capability"
26 / 26 papers shown
Title
Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
Asahi Miyazaki
Tsuyoshi Okita
22
0
0
12 May 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation
Wenzhao Xiang
Chang Liu
Hongyang Yu
Xilin Chen
31
0
0
02 Mar 2025
SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning
Mingsheng Cai
Jiuming Jiang
Wenhao Huang
Che Liu
Rossella Arcucci
AI4TS
44
0
0
27 Feb 2025
RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba
Andong Lu
Wanyu Wang
Chenglong Li
Jin Tang
B. Luo
Mamba
49
2
0
31 Dec 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
73
0
0
24 Nov 2024
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li
B. Hou
Siteng Ma
Zitong Wu
Xianpeng Guo
Bo Ren
Licheng Jiao
43
11
0
04 Aug 2024
Rethinking Overlooked Aspects in Vision-Language Models
Yuan Liu
Le Tian
Xiao Zhou
Jie Zhou
VLM
30
2
0
20 May 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu-Gang Jiang
SSL
21
1
0
31 Mar 2024
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Che Liu
Zhongwei Wan
Ouyang Cheng
Anand Shah
Wenjia Bai
Rossella Arcucci
35
28
0
11 Mar 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu
Jianlin Su
Bo-Wen Zhang
Chunhua Shen
MLLM
35
10
0
01 Mar 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
36
9
0
29 Feb 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
19
7
0
25 Jan 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
Licai Sun
Zheng Lian
Bin Liu
Jianhua Tao
51
29
0
11 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
33
14
0
31 Dec 2023
G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
Che Liu
Ouyang Cheng
Sibo Cheng
Anand Shah
Wenjia Bai
Rossella Arcucci
VLM
MedIm
23
8
0
03 Dec 2023
ESTformer: Transformer Utilizing Spatiotemporal Dependencies for Electroencaphalogram Super-resolution
Dongdong Li
Zhongliang Zeng
Zhe Wang
Hai Yang
23
1
0
03 Dec 2023
Adversarial Purification of Information Masking
Sitong Liu
Z. Lian
Shuangquan Zhang
Liang Xiao
AAML
20
0
0
26 Nov 2023
DeepMIM: Deep Supervision for Masked Image Modeling
Sucheng Ren
Fangyun Wei
Samuel Albanie
Zheng-Wei Zhang
Han Hu
VLM
60
14
0
15 Mar 2023
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Yuan Liu
Songyang Zhang
Jiacheng Chen
Kai-xiang Chen
Dahua Lin
72
28
0
04 Mar 2023
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
52
35
0
19 Oct 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
314
5,775
0
29 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
183
21,804
0
09 Dec 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,827
0
18 Aug 2016
1