ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.11986
  4. Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
    ViT
ArXivPDFHTML

Papers citing "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"

50 / 368 papers shown
Title
Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance
  Segmenter
Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter
T. Ngo
Khoi Duc Minh Nguyen
3DPC
19
4
0
22 Jul 2022
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Kehan Li
Runyi Yu
Zhennan Wang
Li-ming Yuan
Guoli Song
Jie Chen
ViT
24
43
0
20 Jul 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
16
26
0
20 Jul 2022
HiFormer: Hierarchical Multi-scale Representations Using Transformers
  for Medical Image Segmentation
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
A. Kazerouni
Milad Soltany Kadarvish
Reza Azad
Ehsan Khodapanah Aghdam
Julien Cohen-Adad
Dorit Merhof
MedIm
ViT
25
178
0
18 Jul 2022
ESFPNet: efficient deep learning architecture for real-time lesion
  segmentation in autofluorescence bronchoscopic video
ESFPNet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video
Qi Chang
Danish Ahmad
J.W. Toth
R. Bascom
W. Higgins
MedIm
21
49
0
15 Jul 2022
Weakly Supervised Video Salient Object Detection via Point Supervision
Weakly Supervised Video Salient Object Detection via Point Supervision
Shuyong Gao
Hao Xing
Wei Zhang
Yan Wang
Qianyu Guo
Wenqiang Zhang
33
24
0
15 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in
  Realistic Industrial Scenarios
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
17
149
0
12 Jul 2022
Dual Vision Transformer
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection
  with Depth Image Classification
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification
Xin Jia
Changlei Dongye
Yan-Tsung Peng
ViT
27
18
0
09 Jul 2022
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
  Mobile Vision Applications
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
Muhammad Maaz
Abdelrahman M. Shaker
Hisham Cholakkal
Salman Khan
Syed Waqas Zamir
Rao Muhammad Anwer
F. Khan
ViT
27
184
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
16
25
0
17 Jun 2022
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
  in Indoor Scenes
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
Rui Zhu
Zhengqin Li
J. Matai
Fatih Porikli
Manmohan Chandraker
ViT
38
45
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
C. Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
29
15
0
15 Jun 2022
Transformers are Meta-Reinforcement Learners
Transformers are Meta-Reinforcement Learners
L. Melo
OffRL
28
50
0
14 Jun 2022
IL-MCAM: An interactive learning and multi-channel attention
  mechanism-based weakly supervised colorectal histopathology image
  classification approach
IL-MCAM: An interactive learning and multi-channel attention mechanism-based weakly supervised colorectal histopathology image classification approach
Hao Chen
Chen Li
Xirong Li
M. Rahaman
Weiming Hu
...
Wanli Liu
Changhao Sun
Hongzan Sun
Xinyu Huang
M. Grzegorzek
HAI
29
99
0
07 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
18
251
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
18
347
0
02 Jun 2022
CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework
  Using CNN, Visual Transformer and Multilayer Perceptron
CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework Using CNN, Visual Transformer and Multilayer Perceptron
Wanli Liu
Chen Li
N. Xu
Tao Jiang
M. Rahaman
...
Weiming Hu
Hao Chen
Changhao Sun
Yudong Yao
M. Grzegorzek
9
132
0
02 Jun 2022
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet
Nan Wang
Shaohui Lin
Xiaoxiao Li
Ke Li
Yunhang Shen
Yue Gao
Lizhuang Ma
ViT
MedIm
41
33
0
02 Jun 2022
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
Jiacheng Wang
Fei Chen
Yuxi Ma
Liansheng Wang
Zhaodong Fei
Jia Shuai
Xiangdong Tang
Qichao Zhou
Jing Qin
ViT
MedIm
19
63
0
02 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
15
20
0
28 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
23
45
0
28 May 2022
Object-wise Masked Autoencoders for Fast Pre-training
Object-wise Masked Autoencoders for Fast Pre-training
Jiantao Wu
Shentong Mo
ViT
OCL
19
15
0
28 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
60
2,023
0
27 May 2022
Inception Transformer
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
26
187
0
25 May 2022
Super Vision Transformer
Super Vision Transformer
Mingbao Lin
Mengzhao Chen
Yu-xin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
33
20
0
23 May 2022
SelfReformer: Self-Refined Network with Transformer for Salient Object
  Detection
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Y. Yun
Weisi Lin
ViT
60
28
0
23 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive
  Transformer
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Yu Wu
32
46
0
21 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Automatic segmentation of meniscus based on MAE self-supervision and
  point-line weak supervision paradigm
Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Yuhan Xie
Kexin Jiang
Zhiyong Zhang
Shaolong Chen
Xiaodong Zhang
Changzhen Qiu
21
1
0
07 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
180
0
06 May 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
32
76
0
27 Apr 2022
Improving the Transferability of Adversarial Examples with Restructure
  Embedded Patches
Improving the Transferability of Adversarial Examples with Restructure Embedded Patches
Huipeng Zhou
Yu-an Tan
Yajie Wang
Haoran Lyu
Shan-Hung Wu
Yuan-zhang Li
ViT
19
4
0
27 Apr 2022
Boosting Adversarial Transferability of MLP-Mixer
Boosting Adversarial Transferability of MLP-Mixer
Haoran Lyu
Yajie Wang
Yu-an Tan
Huipeng Zhou
Yuhang Zhao
Quan-xin Zhang
AAML
19
1
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
21
4
0
26 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
22
53
0
18 Apr 2022
Application of Transfer Learning and Ensemble Learning in Image-level
  Classification for Breast Histopathology
Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology
Yuchao Zheng
Chen Li
Xiaomin Zhou
Hao Chen
Hao Xu
...
Haiqing Zhang
Xirong Li
Hongzan Sun
Xinyu Huang
M. Grzegorzek
28
55
0
18 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
23
123
0
14 Apr 2022
Neighborhood Attention Transformer
Neighborhood Attention Transformer
Ali Hassani
Steven Walton
Jiacheng Li
Shengjia Li
Humphrey Shi
ViT
AI4TS
36
252
0
14 Apr 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang
Zilong Huang
Guozhong Luo
Tao Chen
Xinggang Wang
Wenyu Liu
Gang Yu
Chunhua Shen
ViT
22
198
0
12 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
39
240
0
07 Apr 2022
Learning Local and Global Temporal Contexts for Video Semantic
  Segmentation
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Guolei Sun
Yun Liu
Henghui Ding
Min Wu
Luc Van Gool
30
32
0
07 Apr 2022
An Empirical Study of Remote Sensing Pretraining
An Empirical Study of Remote Sensing Pretraining
Di Wang
Jing Zhang
Bo Du
Guisong Xia
Dacheng Tao
EDL
31
190
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Z. Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
41
149
0
06 Apr 2022
MixFormer: Mixing Features across Windows and Dimensions
MixFormer: Mixing Features across Windows and Dimensions
Qiang Chen
Qiman Wu
Jian Wang
Qinghao Hu
T. Hu
Errui Ding
Jian Cheng
Jingdong Wang
MDE
ViT
26
101
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
BatchFormerV2: Exploring Sample Relationships for Dense Representation
  Learning
BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning
Zhi Hou
Baosheng Yu
Chaoyue Wang
Yibing Zhan
Dacheng Tao
ViT
26
11
0
04 Apr 2022
Previous
12345678
Next