ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.11986
  4. Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
    ViT
ArXivPDFHTML

Papers citing "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"

50 / 352 papers shown
Title
Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective
Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective
Songsong Duan
Xi Yang
Nannan Wang
Xinbo Gao
55
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
142
0
0
06 May 2025
Optimal Hyperspectral Undersampling Strategy for Satellite Imaging
Optimal Hyperspectral Undersampling Strategy for Satellite Imaging
Vita V. Vlasova
Vladimir G. Kuzmin
Maria S. Varetsa
Natalia A. Ibragimova
Oleg Y. Rogov
Elena V. Lyapuntsova
19
0
0
27 Apr 2025
TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive Representations
TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive Representations
Yihang Lu
Yangyang Xu
Qitao Qing
Xianwei Meng
AI4TS
44
0
0
17 Apr 2025
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Zhenyu Yang
Haiming Zhu
Rihui Zhang
Haipeng Zhang
Jianliang Wang
Chunhao Wang
Minbin Chen
F. Yin
MedIm
38
0
0
15 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations
Junsong Zhang
Chunyu Lin
Zhijie Shen
Lang Nie
K. Liao
Yao Zhao
35
0
0
03 Mar 2025
VRM: Knowledge Distillation via Virtual Relation Matching
VRM: Knowledge Distillation via Virtual Relation Matching
W. Zhang
Fei Xie
Weidong Cai
Chao Ma
76
0
0
28 Feb 2025
Low-Rank Thinning
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
58
0
0
17 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
145
0
0
25 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
149
0
0
21 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
152
611
0
31 Dec 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
Yuqi Jiang
Xudong Lu
Qian Jin
Qi Sun
Hanming Wu
Cheng Zhuo
36
5
0
15 Jul 2024
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit
  for Real-Time UAV Tracking
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking
You Wu
Xucheng Wang
Dan Zeng
Hengzhou Ye
Xiaolan Xie
Qijun Zhao
Shuiwang Li
35
3
0
07 Jul 2024
Improving robustness to corruptions with multiplicative weight
  perturbations
Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski
41
0
0
24 Jun 2024
Predicting Probabilities of Error to Combine Quantization and Early
  Exiting: QuEE
Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE
Florence Regol
Joud Chataoui
Bertrand Charpentier
Mark J. Coates
Pablo Piantanida
Stephan Gunnemann
39
0
0
20 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual
  Tracking
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
59
3
0
12 Jun 2024
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
Hou-I Liu
Yu-Wen Tseng
Kai-Cheng Chang
Pin-Jyun Wang
Hong-Han Shuai
Wen-Huang Cheng
ViT
ObjD
40
24
0
09 Jun 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
43
7
0
28 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification
  along the Intersection Region of Adversarial Trajectory
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing-Wu Guo
AAML
38
14
0
19 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
25
15
0
18 Mar 2024
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue
Tao Wang
Jun Wang
Kaihao Zhang
ViT
43
2
0
09 Mar 2024
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth
  Limited Optical Signal Acquisition
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition
Lingfeng Liu
Dong Ni
Hangjie Yuan
ViT
29
0
0
03 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
42
6
0
28 Feb 2024
FViT: A Focal Vision Transformer with Gabor Filter
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
52
4
0
17 Feb 2024
Learning Low-Rank Feature for Thorax Disease Classification
Learning Low-Rank Feature for Thorax Disease Classification
Rajeev Goel
Utkarsh Nath
Yancheng Wang
Alvin C. Silva
Teresa Wu
Yingzhen Yang
22
0
0
14 Feb 2024
DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms
  in Vision Transformers
DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers
Oryan Yehezkel
Alon Zolfi
Amit Baras
Yuval Elovici
A. Shabtai
AAML
29
0
0
04 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
70
75
0
03 Feb 2024
CascadedGaze: Efficiency in Global Context Extraction for Image
  Restoration
CascadedGaze: Efficiency in Global Context Extraction for Image Restoration
Amirhosein Ghasemabadi
Muhammad Kamran Janjua
Mohammad Salameh
Chunhua Zhou
Fengyu Sun
Di Niu
32
11
0
26 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
22
5
0
09 Jan 2024
360 Layout Estimation via Orthogonal Planes Disentanglement and
  Multi-view Geometric Consistency Perception
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception
Zhijie Shen
Chunyu Lin
Junsong Zhang
Lang Nie
K. Liao
Yao Zhao
28
5
0
26 Dec 2023
A Survey on Open-Set Image Recognition
A Survey on Open-Set Image Recognition
Jiaying Sun
Qiulei Dong
BDL
ObjD
32
3
0
25 Dec 2023
Video Recognition in Portrait Mode
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
30
3
0
21 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
25
4
0
07 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
33
0
0
01 Dec 2023
QuadraNet: Improving High-Order Neural Interaction Efficiency with
  Hardware-Aware Quadratic Neural Networks
QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks
Chenhui Xu
Fuxun Yu
Zirui Xu
Chenchen Liu
Jinjun Xiong
Xiang Chen
33
4
0
29 Nov 2023
Improved TokenPose with Sparsity
Improved TokenPose with Sparsity
Anning Li
ViT
34
0
0
16 Nov 2023
Rotation Invariant Transformer for Recognizing Object in UAVs
Rotation Invariant Transformer for Recognizing Object in UAVs
Shuo Chen
Mang Ye
Bo Du
ViT
32
18
0
05 Nov 2023
Improving Robustness for Vision Transformer with a Simple Dynamic
  Scanning Augmentation
Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation
Shashank Kotyan
Danilo Vasconcellos Vargas
ViT
22
2
0
01 Nov 2023
Minimalist and High-Performance Semantic Segmentation with Plain Vision
  Transformers
Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers
Yuanduo Hong
Jue Wang
Weichao Sun
Huihui Pan
VLM
ViT
37
7
0
19 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
31
4
0
10 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Low-Resolution Self-Attention for Semantic Segmentation
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
40
3
0
08 Oct 2023
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient
  Channels
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
F. Wang
Mike Zheng Shou
ViT
34
14
0
15 Sep 2023
Interpretability-Aware Vision Transformer
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
82
7
0
14 Sep 2023
SwinFace: A Multi-task Transformer for Face Recognition, Expression
  Recognition, Age Estimation and Attribute Estimation
SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation
Lixiong Qin
Mei Wang
Chao Deng
K. Wang
Xiangshan Chen
Jiani Hu
Weihong Deng
CVBM
ViT
29
38
0
22 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
28
30
0
21 Aug 2023
A survey on deep learning in medical image registration: new
  technologies, uncertainty, evaluation metrics, and beyond
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Junyu Chen
Yihao Liu
Shuwen Wei
Zhangxing Bian
Shalini Subramanian
A. Carass
Jerry L. Prince
Yong Du
OOD
39
36
0
28 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of
  Vision-Language Pre-training Models
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
51
65
0
26 Jul 2023
12345678
Next