Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.13677
Cited By
SWAT: Spatial Structure Within and Among Tokens
26 November 2021
Kumara Kahatapitiya
Michael S. Ryoo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SWAT: Spatial Structure Within and Among Tokens"
20 / 20 papers shown
Title
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
17
25
0
03 Oct 2022
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Jinghuan Shang
Srijan Das
Michael S. Ryoo
36
13
0
23 Jun 2022
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
214
395
0
24 Jan 2022
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
F. Brémond
ViT
36
72
0
07 Dec 2021
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang
Kumara Kahatapitiya
Xiang Li
Michael S. Ryoo
OffRL
27
33
0
12 Oct 2021
ConvMLP: Hierarchical Convolutional MLPs for Vision
Jiachen Li
Ali Hassani
Steven Walton
Humphrey Shi
33
55
0
09 Sep 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
172
462
0
12 Aug 2021
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
F. Khan
Ming-Hsuan Yang
ViT
248
618
0
21 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
136
193
0
23 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
573
0
22 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,490
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
260
178
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi-Xin Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
241
555
0
31 Dec 2020
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
261
10,106
0
16 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
249
1,817
0
18 Aug 2016
1