Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.05909
Cited By
Stand-Alone Self-Attention in Vision Models
13 June 2019
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stand-Alone Self-Attention in Vision Models"
50 / 234 papers shown
Title
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
150
361
0
24 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
38
238
0
12 Jan 2022
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
42
4,972
0
10 Jan 2022
Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Piotr Bojanowski
Armand Joulin
Gabriel Synnaeve
Hervé Jégou
ViT
35
47
0
27 Dec 2021
Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence
Wenchi Ma
Tianxiao Zhang
Guanghui Wang
ViT
33
14
0
26 Dec 2021
Assessing the Impact of Attention and Self-Attention Mechanisms on the Classification of Skin Lesions
Rafael Pedro
Arlindo L. Oliveira
26
14
0
23 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
26
20
0
17 Dec 2021
Efficient Visual Tracking with Exemplar Transformers
Philippe Blatter
Menelaos Kanakis
Martin Danelljan
Luc Van Gool
ViT
21
79
0
17 Dec 2021
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction
Guang-Sheng Chen
Meiling Wang
Yufeng Yue
Qingxiang Zhang
Li-xin Yuan
ViT
37
17
0
17 Dec 2021
Embracing Single Stride 3D Object Detector with Sparse Transformer
Lue Fan
Ziqi Pang
Tianyuan Zhang
Yu-xiong Wang
Hang Zhao
Feng Wang
Naiyan Wang
Zhaoxiang Zhang
ViT
27
255
0
13 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
28
3
0
10 Dec 2021
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Anirudh Thatipelli
Sanath Narayan
Salman Khan
Rao Muhammad Anwer
F. Khan
Bernard Ghanem
ViT
25
88
0
09 Dec 2021
3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis
Jianhui Yu
Chaoyi Zhang
Heng Wang
Dingxin Zhang
Yang Song
Tiange Xiang
Dongnan Liu
Weidong (Tom) Cai
ViT
MedIm
21
32
0
09 Dec 2021
Fast Point Transformer
Chunghyun Park
Yoonwoo Jeong
Minsu Cho
Jaesik Park
3DPC
ViT
30
168
0
09 Dec 2021
Recurrent Glimpse-based Decoder for Detection with Transformer
Zhe Chen
Jing Zhang
Dacheng Tao
ViT
22
30
0
09 Dec 2021
On the Integration of Self-Attention and Convolution
Xuran Pan
Chunjiang Ge
Rui Lu
S. Song
Guanfu Chen
Zeyi Huang
Gao Huang
SSL
41
287
0
29 Nov 2021
Video Frame Interpolation Transformer
Zhihao Shi
Xiangyu Xu
Xiaohong Liu
Jun Chen
Ming-Hsuan Yang
ViT
17
157
0
27 Nov 2021
BoxeR: Box-Attention for 2D and 3D Transformers
Duy-Kien Nguyen
Jihong Ju
Olaf Booji
Martin R. Oswald
Cees G. M. Snoek
ViT
28
36
0
25 Nov 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
42
238
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
18
292
0
24 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
39
98
0
22 Nov 2021
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi
Huiyu Wang
Yingwei Li
Zizhang Li
Alan Yuille
ViT
21
3
0
15 Nov 2021
Full-attention based Neural Architecture Search using Context Auto-regression
Yuan Zhou
Haiyang Wang
Shuwei Huo
Boyu Wang
27
3
0
13 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
71
330
0
11 Nov 2021
Are Transformers More Robust Than CNNs?
Yutong Bai
Jieru Mei
Alan Yuille
Cihang Xie
ViT
AAML
192
257
0
10 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
27
28
0
02 Nov 2021
Gabor filter incorporated CNN for compression
Akihiro Imamura
N. Arizumi
CVBM
22
2
0
29 Oct 2021
Dispensed Transformer Network for Unsupervised Domain Adaptation
Yunxiang Li
Jingxiong Li
Ruilong Dan
Shuai Wang
Kai Jin
...
Qianni Zhang
Huiyu Zhou
Qun Jin
Li Wang
Yaqi Wang
OOD
MedIm
20
4
0
28 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
32
227
0
18 Oct 2021
Finding Strong Gravitational Lenses Through Self-Attention
H. Thuruthipilly
A. Zadrożny
Agnieszka Pollo
Marek Biesiada
16
6
0
18 Oct 2021
Multi-View Stereo Network with attention thin volume
Zihang Wan
3DV
23
1
0
16 Oct 2021
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis
Hossein Aboutalebi
Maya Pavlova
Hayden Gunraj
M. Shafiee
A. Sabri
Amer Alaref
Alexander Wong
20
17
0
12 Oct 2021
Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition
Mingzhou Liu
Xinwei Sun
Fandong Zhang
Yizhou Yu
Yizhou Wang
27
0
0
08 Oct 2021
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
76
66
0
08 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
41
78
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
215
1,213
0
05 Oct 2021
VTAMIQ: Transformers for Attention Modulated Image Quality Assessment
Andrei Chubarau
James Clark
ViT
32
9
0
04 Oct 2021
GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation
Yunxiang Li
Shuai Wang
Jun Wang
G. Zeng
Wenjun Liu
Qianni Zhang
Qun Jin
Yaqi Wang
ViT
MedIm
28
47
0
30 Sep 2021
Is Attention Better Than Matrix Decomposition?
Zhengyang Geng
Meng-Hao Guo
Hongxu Chen
Xia Li
Ke Wei
Zhouchen Lin
56
137
0
09 Sep 2021
Learning the Physics of Particle Transport via Transformers
O. Pastor-Serrano
Zoltán Perkó
MedIm
21
13
0
08 Sep 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement
Wenxi Liu
Qi Li
Xin Lin
Weixiang Yang
Shengfeng He
Yuanlong Yu
29
7
0
06 Sep 2021
Revisiting 3D ResNets for Video Recognition
Xianzhi Du
Yeqing Li
Yin Cui
Rui Qian
Jing Li
Irwan Bello
51
17
0
03 Sep 2021
Learning Inner-Group Relations on Point Clouds
Haoxi Ran
Wei Zhuo
J. Liu
Li Lu
3DPC
37
59
0
27 Aug 2021
Memory-Augmented Non-Local Attention for Video Super-Resolution
Ji-yang Yu
Jingen Liu
Liefeng Bo
Tao Mei
SupR
25
32
0
25 Aug 2021
SwinIR: Image Restoration Using Swin Transformer
Jingyun Liang
Jie Cao
Guolei Sun
K. Zhang
Luc Van Gool
Radu Timofte
ViT
45
2,806
0
23 Aug 2021
Relational Embedding for Few-Shot Classification
Dahyun Kang
Heeseung Kwon
Juhong Min
Minsu Cho
34
185
0
22 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
10
18
0
20 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
52
924
0
19 Aug 2021
PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds
Jiayao Shan
Sifan Zhou
Zheng Fang
Yubo Cui
ViT
25
79
0
14 Aug 2021
Previous
1
2
3
4
5
Next