Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.03602
Cited By
SiT: Self-supervised vIsion Transformer
8 April 2021
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SiT: Self-supervised vIsion Transformer"
40 / 40 papers shown
Title
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
Tom Sander
Moritz Tenthoff
Kay Wohlfarth
Christian Wöhler
19
0
0
08 May 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
66
0
0
13 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
76
6
0
27 Feb 2025
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu
Peijin Wang
Hanbo Bi
Boyuan Tong
Z. Wang
...
Ziqi Zhang
QiXiang Ye
Kun Fu
Xian Sun
Xian Sun
98
0
0
27 Nov 2024
Behavioral Cloning Models Reality Check for Autonomous Driving
M. Yildirim
Barkin Dagda
Vinal Asodia
Saber Fallah
OffRL
19
1
0
11 Sep 2024
Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification
Peng Gao
Yujian Lee
Hui Zhang
Xubo Liu
Yiyang Hu
Guquan Jing
27
1
0
21 May 2024
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
39
62
0
11 Dec 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
32
0
0
24 Aug 2023
Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding
Jiantao Wu
Shentong Mo
Muhammad Awais
Sara Atito
Zhenhua Feng
J. Kittler
VLM
23
4
0
22 Aug 2023
DPPMask: Masked Image Modeling with Determinantal Point Processes
Junde Xu
Zikai Lin
Donghao Zhou
Yao-Cheng Yang
Xiangyun Liao
Bian Wu
Guangyong Chen
Pheng-Ann Heng
18
1
0
13 Mar 2023
Knowledge Graph Completion Method Combined With Adaptive Enhanced Semantic Information
Weidong Ji
Zengxiang Yin
Guohui Zhou
Yuqi Yue
Xinru Zhang
Chenghong Sun
8
0
0
04 Feb 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
21
2
0
15 Jan 2023
A New Perspective to Boost Vision Transformer for Medical Image Classification
Yuexiang Li
Yawen Huang
Nanjun He
Kai Ma
Yefeng Zheng
ViT
MedIm
19
3
0
03 Jan 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
18
3
0
21 Dec 2022
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model
Syed Muhammad Anwar
Abhijeet Parida
Sara Atito
Muhammad Awais
G. Nino
Josef Kitler
M. Linguraru
ViT
SSL
OOD
21
6
0
23 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
15
80
0
18 Nov 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
26
64
0
19 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
24
19
0
09 Oct 2022
Transformer based Fingerprint Feature Extraction
Saraansh Tandon
A. Namboodiri
ViT
28
8
0
08 Sep 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
28
32
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
25
97
0
16 Jun 2022
Rethinking Generalization in Few-Shot Classification
Markus Hiller
Rongkai Ma
Mehrtash Harandi
Tom Drummond
OCL
VLM
17
55
0
15 Jun 2022
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
34
18
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
110
17
0
30 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
Aishik Konwer
Xuan Xu
Joseph Bae
Chaoyu Chen
Prateek Prasanna
MedIm
28
15
0
02 Mar 2022
Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Pascal Schneider
J. Rambach
B. Mirbach
D. Stricker
17
7
0
02 Mar 2022
Training Vision Transformers with Only 2040 Images
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
90
42
0
26 Jan 2022
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
15
17
0
30 Nov 2021
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
30
268
0
19 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
16
36
0
11 Oct 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
17
37
0
29 Jul 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
29
610
0
18 Jun 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
25
22
0
31 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
225
2,427
0
04 Jan 2021
Self-Supervised Feature Learning by Learning to Spot Artifacts
Simon Jenni
Paolo Favaro
SSL
137
127
0
13 Jun 2018
1