ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.03602
  4. Cited By
SiT: Self-supervised vIsion Transformer

SiT: Self-supervised vIsion Transformer

8 April 2021
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
    ViT
ArXivPDFHTML

Papers citing "SiT: Self-supervised vIsion Transformer"

40 / 40 papers shown
Title
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
Tom Sander
Moritz Tenthoff
Kay Wohlfarth
Christian Wöhler
19
0
0
08 May 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
66
0
0
13 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
76
6
0
27 Feb 2025
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu
Peijin Wang
Hanbo Bi
Boyuan Tong
Z. Wang
...
Ziqi Zhang
QiXiang Ye
Kun Fu
Xian Sun
Xian Sun
98
0
0
27 Nov 2024
Behavioral Cloning Models Reality Check for Autonomous Driving
Behavioral Cloning Models Reality Check for Autonomous Driving
M. Yildirim
Barkin Dagda
Vinal Asodia
Saber Fallah
OffRL
19
1
0
11 Sep 2024
Dynamic Identity-Guided Attention Network for Visible-Infrared Person
  Re-identification
Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification
Peng Gao
Yujian Lee
Hui Zhang
Xubo Liu
Yiyang Hu
Guquan Jing
27
1
0
21 May 2024
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
39
62
0
11 Dec 2023
Masked Feature Modelling: Feature Masking for the Unsupervised
  Pre-training of a Graph Attention Network Block for Bottom-up Video Event
  Recognition
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
32
0
0
24 Aug 2023
Masked Momentum Contrastive Learning for Zero-shot Semantic
  Understanding
Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding
Jiantao Wu
Shentong Mo
Muhammad Awais
Sara Atito
Zhenhua Feng
J. Kittler
VLM
23
4
0
22 Aug 2023
DPPMask: Masked Image Modeling with Determinantal Point Processes
DPPMask: Masked Image Modeling with Determinantal Point Processes
Junde Xu
Zikai Lin
Donghao Zhou
Yao-Cheng Yang
Xiangyun Liao
Bian Wu
Guangyong Chen
Pheng-Ann Heng
18
1
0
13 Mar 2023
Knowledge Graph Completion Method Combined With Adaptive Enhanced
  Semantic Information
Knowledge Graph Completion Method Combined With Adaptive Enhanced Semantic Information
Weidong Ji
Zengxiang Yin
Guohui Zhou
Yuqi Yue
Xinru Zhang
Chenghong Sun
8
0
0
04 Feb 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
  Industry
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
21
2
0
15 Jan 2023
A New Perspective to Boost Vision Transformer for Medical Image
  Classification
A New Perspective to Boost Vision Transformer for Medical Image Classification
Yuexiang Li
Yawen Huang
Nanjun He
Kai Ma
Yefeng Zheng
ViT
MedIm
19
3
0
03 Jan 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
18
3
0
21 Dec 2022
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain
  Specific Foundation Model
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model
Syed Muhammad Anwar
Abhijeet Parida
Sara Atito
Muhammad Awais
G. Nino
Josef Kitler
M. Linguraru
ViT
SSL
OOD
21
6
0
23 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
15
80
0
18 Nov 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
26
64
0
19 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
24
19
0
09 Oct 2022
Transformer based Fingerprint Feature Extraction
Transformer based Fingerprint Feature Extraction
Saraansh Tandon
A. Namboodiri
ViT
28
8
0
08 Sep 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
28
32
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
25
97
0
16 Jun 2022
Rethinking Generalization in Few-Shot Classification
Rethinking Generalization in Few-Shot Classification
Markus Hiller
Rongkai Ma
Mehrtash Harandi
Tom Drummond
OCL
VLM
17
55
0
15 Jun 2022
GMML is All you Need
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
34
18
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
110
17
0
30 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Temporal Context Matters: Enhancing Single Image Prediction with Disease
  Progression Representations
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
Aishik Konwer
Xuan Xu
Joseph Bae
Chaoyu Chen
Prateek Prasanna
MedIm
28
15
0
02 Mar 2022
Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Pascal Schneider
J. Rambach
B. Mirbach
D. Stricker
17
7
0
02 Mar 2022
Training Vision Transformers with Only 2040 Images
Training Vision Transformers with Only 2040 Images
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
90
42
0
26 Jan 2022
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
15
17
0
30 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
30
268
0
19 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
16
36
0
11 Oct 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
17
37
0
29 Jul 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
29
610
0
18 Jun 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
25
22
0
31 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
225
2,427
0
04 Jan 2021
Self-Supervised Feature Learning by Learning to Spot Artifacts
Self-Supervised Feature Learning by Learning to Spot Artifacts
Simon Jenni
Paolo Favaro
SSL
137
127
0
13 Jun 2018
1