ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.03348
  4. Cited By
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

7 June 2021
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
    ViT
ArXivPDFHTML

Papers citing "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias"

50 / 197 papers shown
Title
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
  Vision Transformer for Fast Arbitrary One-Shot Image Generation
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation
Yunliang Jiang
Li Yan
Xiongtao Zhang
Yong-Jin Liu
Da-Song Sun
ViT
21
5
0
16 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
19
136
0
03 Feb 2023
SCCAM: Supervised Contrastive Convolutional Attention Mechanism for
  Ante-hoc Interpretable Fault Diagnosis with Limited Fault Samples
SCCAM: Supervised Contrastive Convolutional Attention Mechanism for Ante-hoc Interpretable Fault Diagnosis with Limited Fault Samples
Mengxuan Li
Peng Peng
Jingxin Zhang
Hongwei Wang
Weiming Shen
14
16
0
03 Feb 2023
Rethinking Mobile Block for Efficient Attention-based Models
Rethinking Mobile Block for Efficient Attention-based Models
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
44
89
0
03 Jan 2023
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group
  Propagation
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
X. Wang
ViT
30
21
0
13 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
34
40
0
07 Dec 2022
Learning to Learn Better for Video Object Segmentation
Learning to Learn Better for Video Object Segmentation
Meng Lan
Jing Zhang
Lefei Zhang
Dacheng Tao
VOS
27
17
0
05 Dec 2022
GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in
  Point Clouds
GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds
Jiahao Nie
Zhiwei He
Yuxiang Yang
Mingyu Gao
Jing Zhang
3DPC
15
39
0
20 Nov 2022
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
  Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Maoyuan Ye
Jing Zhang
Shanshan Zhao
Juhua Liu
Tongliang Liu
Bo Du
Dacheng Tao
31
70
0
19 Nov 2022
Unifying Flow, Stereo and Depth Estimation
Unifying Flow, Stereo and Depth Estimation
Haofei Xu
Jing Zhang
Jianfei Cai
Hamid Rezatofighi
F. I. F. Richard Yu
Dacheng Tao
Andreas Geiger
MDE
19
191
0
10 Nov 2022
Rethinking Hierarchies in Pre-trained Plain Vision Transformer
Rethinking Hierarchies in Pre-trained Plain Vision Transformer
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
13
1
0
03 Nov 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural
  Networks on Small Datasets
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
10
56
0
12 Oct 2022
Towards Theoretically Inspired Neural Initialization Optimization
Towards Theoretically Inspired Neural Initialization Optimization
Yibo Yang
Hong Wang
Haobo Yuan
Zhouchen Lin
8
9
0
12 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
24
58
0
04 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
98
87
0
30 Sep 2022
Exploring the Relationship between Architecture and Adversarially Robust
  Generalization
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Aishan Liu
Shiyu Tang
Siyuan Liang
Ruihao Gong
Boxi Wu
Xianglong Liu
Dacheng Tao
AAML
21
18
0
28 Sep 2022
HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical
  Image Classification
HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification
Xiangzuo Huo
Gang Sun
Sheng Tian
Yan Wang
Long Yu
Jun Long
Wendong Zhang
Aolun Li
28
100
0
21 Sep 2022
A Simple and Powerful Global Optimization for Unsupervised Video Object
  Segmentation
A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation
Georgy Ponimatkin
Nermin Samet
Yanghua Xiao
Yuming Du
Renaud Marlet
Vincent Lepetit
VOS
72
20
0
19 Sep 2022
Robust Ensemble Morph Detection with Domain Generalization
Robust Ensemble Morph Detection with Domain Generalization
Hossein Kashiani
S. Sami
Sobhan Soleymani
Nasser M. Nasrabadi
OOD
AAML
13
8
0
16 Sep 2022
MRL: Learning to Mix with Attention and Convolutions
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
14
2
0
30 Aug 2022
Few-Shot Learning Meets Transformer: Unified Query-Support Transformers
  for Few-Shot Classification
Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification
Xixi Wang
Xiao Wang
Bo Jiang
Bin Luo
21
42
0
26 Aug 2022
Video Mobile-Former: Video Recognition with Efficient Global
  Spatial-temporal Modeling
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
35
4
0
25 Aug 2022
FocusFormer: Focusing on What We Need via Architecture Sampler
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
27
7
0
23 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
L. Zhang
23
242
0
08 Aug 2022
MVSFormer: Multi-View Stereo by Learning Robust Image Features and
  Temperature-based Depth
MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth
Chenjie Cao
Xinlin Ren
Yanwei Fu
22
44
0
04 Aug 2022
Local Perception-Aware Transformer for Aerial Tracking
Local Perception-Aware Transformer for Aerial Tracking
Changhong Fu
Wei Peng
Sihang Li
Junjie Ye
Ziang Cao
23
5
0
01 Aug 2022
Equivariance and Invariance Inductive Bias for Learning from
  Insufficient Data
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
Tan Wang
Qianru Sun
Sugiri Pranata
J. Karlekar
Hanwang Zhang
SSL
23
19
0
25 Jul 2022
Hierarchical Semi-Supervised Contrastive Learning for
  Contamination-Resistant Anomaly Detection
Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection
Gaoang Wang
Yibing Zhan
Xinchao Wang
Min-Gyoo Song
K. Nahrstedt
14
11
0
24 Jul 2022
Learning Graph Neural Networks for Image Style Transfer
Learning Graph Neural Networks for Image Style Transfer
Yongcheng Jing
Yining Mao
Yiding Yang
Yibing Zhan
Mingli Song
Xinchao Wang
Dacheng Tao
30
55
0
24 Jul 2022
Multi Resolution Analysis (MRA) for Approximate Self-Attention
Multi Resolution Analysis (MRA) for Approximate Self-Attention
Zhanpeng Zeng
Sourav Pal
Jeffery Kline
G. Fung
Vikas Singh
13
6
0
21 Jul 2022
MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
Yaqian Liang
Shanshan Zhao
Baosheng Yu
Jing Zhang
Fazhi He
ViT
24
37
0
20 Jul 2022
Multi-manifold Attention for Vision Transformers
Multi-manifold Attention for Vision Transformers
D. Konstantinidis
Ilias Papastratis
K. Dimitropoulos
P. Daras
ViT
14
16
0
18 Jul 2022
Defect Transformer: An Efficient Hybrid Transformer Architecture for
  Surface Defect Detection
Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection
Junpu Wang
Guili Xu
Fuju Yan
Jinjin Wang
Zhengsheng Wang
ViT
MedIm
21
65
0
17 Jul 2022
JPerceiver: Joint Perception Network for Depth, Pose and Layout
  Estimation in Driving Scenes
JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes
Haimei Zhao
Jing Zhang
Sen Zhang
Dacheng Tao
19
15
0
16 Jul 2022
ReAct: Temporal Action Detection with Relational Queries
ReAct: Temporal Action Detection with Relational Queries
Ding Shi
Yujie Zhong
Qiong Cao
Jing Zhang
Lin Ma
Jia Li
Dacheng Tao
ViT
21
68
0
14 Jul 2022
Transformer-based Context Condensation for Boosting Feature Pyramids in
  Object Detection
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
14
11
0
14 Jul 2022
DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
  Transformer
DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer
Maoyuan Ye
Jing Zhang
Shanshan Zhao
Juhua Liu
Bo Du
Dacheng Tao
ViT
32
73
0
10 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
44
6
0
09 Jul 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
  Transformers
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
42
218
0
05 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
30
32
0
19 Jun 2022
Maximum Class Separation as Inductive Bias in One Matrix
Maximum Class Separation as Inductive Bias in One Matrix
Tejaswi Kasarla
Gertjan J. Burghouts
Max van Spengler
Elise van der Pol
Rita Cucchiara
Pascal Mettes
19
22
0
17 Jun 2022
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
Yuxiang Yang
Junjie Yang
Yufei Xu
Jing Zhang
Long Lan
Dacheng Tao
13
38
0
12 Jun 2022
Referring Image Matting
Referring Image Matting
Jizhizi Li
Jing Zhang
Dacheng Tao
ObjD
VLM
18
22
0
10 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
19
19
0
02 Jun 2022
Modeling Image Composition for Complex Scene Generation
Modeling Image Composition for Complex Scene Generation
Zuopeng Yang
Daqing Liu
Chaoyue Wang
J. Yang
Dacheng Tao
ViT
34
49
0
02 Jun 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
15
45
0
28 May 2022
Inception Transformer
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
26
187
0
25 May 2022
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision
  Transformers
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
Bin Ren
Yahui Liu
Yue Song
Wei Bi
Rita Cucchiara
N. Sebe
Wei Wang
46
21
0
25 May 2022
Learning Localization-aware Target Confidence for Siamese Visual
  Tracking
Learning Localization-aware Target Confidence for Siamese Visual Tracking
Jiahao Nie
Han Wu
Zhiwei He
Yuxiang Yang
Mingyu Gao
Zhekang Dong
22
25
0
29 Apr 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
19
76
0
27 Apr 2022
Previous
1234
Next