ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05909
  4. Cited By
Stand-Alone Self-Attention in Vision Models

Stand-Alone Self-Attention in Vision Models

13 June 2019
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
    VLM
    SLR
    ViT
ArXivPDFHTML

Papers citing "Stand-Alone Self-Attention in Vision Models"

50 / 234 papers shown
Title
Learning Fair Face Representation With Progressive Cross Transformer
Learning Fair Face Representation With Progressive Cross Transformer
Yong Li
Yufei Sun
Zhen Cui
Shiguang Shan
Jian Yang
27
12
0
11 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
24
12
0
09 Aug 2021
Understanding the computational demands underlying visual reasoning
Understanding the computational demands underlying visual reasoning
Mohit Vaishnav
Rémi Cadène
A. Alamia
Drew Linsley
Rufin VanRullen
Thomas Serre
GNN
CoGe
34
16
0
08 Aug 2021
Global Self-Attention as a Replacement for Graph Convolution
Global Self-Attention as a Replacement for Graph Convolution
Md Shamim Hussain
Mohammed J. Zaki
D. Subramanian
ViT
37
122
0
07 Aug 2021
Rethinking and Improving Relative Position Encoding for Vision
  Transformer
Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu
Houwen Peng
Minghao Chen
Jianlong Fu
Hongyang Chao
ViT
42
329
0
29 Jul 2021
A3GC-IP: Attention-Oriented Adjacency Adaptive Recurrent Graph
  Convolutions for Human Pose Estimation from Sparse Inertial Measurements
A3GC-IP: Attention-Oriented Adjacency Adaptive Recurrent Graph Convolutions for Human Pose Estimation from Sparse Inertial Measurements
Patrik Puchert
Timo Ropinski
3DH
17
3
0
23 Jul 2021
Video Crowd Localization with Multi-focus Gaussian Neighborhood
  Attention and a Large-Scale Benchmark
Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li
Lingbo Liu
Kunlin Yang
Shinan Liu
Junyuan Gao
Bin Zhao
Rui Zhang
Jun Hou
44
14
0
19 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip H. S. Torr
50
27
0
13 Jul 2021
Test-Time Personalization with a Transformer for Human Pose Estimation
Test-Time Personalization with a Transformer for Human Pose Estimation
Yizhuo Li
Miao Hao
Zonglin Di
N. B. Gundavarapu
Xiaolong Wang
ViT
25
45
0
05 Jul 2021
Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Huajun Liu
Fuqiang Liu
Xinyi Fan
Dong Huang
77
211
0
02 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
42
428
0
01 Jul 2021
Probabilistic Attention for Interactive Segmentation
Probabilistic Attention for Interactive Segmentation
Prasad Gabbur
Manjot Bilkhu
J. Movellan
26
13
0
23 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
32
127
0
21 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer
  Training
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
21
32
0
17 Jun 2021
Attention-based Domain Adaptation for Single Stage Detectors
Attention-based Domain Adaptation for Single Stage Detectors
Vidit Vidit
Mathieu Salzmann
ObjD
27
13
0
14 Jun 2021
Rethinking Architecture Design for Tackling Data Heterogeneity in
  Federated Learning
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
Liangqiong Qu
Yuyin Zhou
Paul Pu Liang
Yingda Xia
Feifei Wang
Ehsan Adeli
L. Fei-Fei
D. Rubin
FedML
AI4CE
19
174
0
10 Jun 2021
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in
  Time
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Shao-Wei Liu
Hanwen Jiang
Jiarui Xu
Sifei Liu
Xiaolong Wang
3DH
35
160
0
09 Jun 2021
Rethinking Space-Time Networks with Improved Memory Coverage for
  Efficient Video Object Segmentation
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Ho Kei Cheng
Yu-Wing Tai
Chi-Keung Tang
VOS
35
279
0
09 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
DLA-Net: Learning Dual Local Attention Features for Semantic
  Segmentation of Large-Scale Building Facade Point Clouds
DLA-Net: Learning Dual Local Attention Features for Semantic Segmentation of Large-Scale Building Facade Point Clouds
Yanfei Su
Weiquan Liu
Zhimin Yuan
Ming Cheng
Zhihong Zhang
Xuelun Shen
Cheng-Yu Wang
3DPC
17
38
0
01 Jun 2021
ResT: An Efficient Transformer for Visual Recognition
ResT: An Efficient Transformer for Visual Recognition
Qing-Long Zhang
Yubin Yang
ViT
29
229
0
28 May 2021
Interpretable UAV Collision Avoidance using Deep Reinforcement Learning
Interpretable UAV Collision Avoidance using Deep Reinforcement Learning
Deep Thomas
Daniil Olshanskyi
Karter Krueger
Tichakorn Wongpiromsarn
Ali Jannesari
16
5
0
25 May 2021
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
F. Khan
Ming-Hsuan Yang
ViT
256
621
0
21 May 2021
A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell
  Segmentation
A Multi-Branch Hybrid Transformer Networkfor Corneal Endothelial Cell Segmentation
Yinglin Zhang
Risa Higashita
H. Fu
Yanwu Xu
Yang Zhang
Haofeng Liu
Jian Zhang
Jiang-Dong Liu
ViT
MedIm
15
51
0
21 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for
  Visual Tasks
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
20
473
0
05 May 2021
Attention-based Stylisation for Exemplar Image Colourisation
Attention-based Stylisation for Exemplar Image Colourisation
Marc Górriz Blanch
Issa Khalifeh
A. Smeaton
Noel E. O'Connor
M. Mrak
25
4
0
04 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
271
2,603
0
04 May 2021
AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for
  Automated Evaluation of Root Canal Therapy
AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy
Yunxiang Li
G. Zeng
Yifan Zhang
Jun Wang
Qianni Zhang
...
Neng Xia
Ruizi Peng
Kai Tang
Yaqi Wang
Shuai Wang
MedIm
AI4CE
92
28
0
02 May 2021
ConTNet: Why not use convolution and transformer at the same time?
ConTNet: Why not use convolution and transformer at the same time?
Haotian Yan
Zhe Li
Weijian Li
Changhu Wang
Ming Wu
Chuang Zhang
ViT
14
76
0
27 Apr 2021
CAGAN: Text-To-Image Generation with Combined Attention GANs
CAGAN: Text-To-Image Generation with Combined Attention GANs
Henning Schulze
Dogucan Yaman
Alexander Waibel
GAN
27
3
0
26 Apr 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
120
209
0
26 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
54
1,222
0
22 Apr 2021
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Jianxin Li
Hao Peng
Yuwei Cao
Yingtong Dou
Hekai Zhang
Philip S. Yu
Lifang He
22
79
0
16 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
54
462
0
12 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
25
986
0
31 Mar 2021
Rethinking Spatial Dimensions of Vision Transformers
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
336
564
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,087
0
29 Mar 2021
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
  Object Localization
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
Wei Gao
Fang Wan
Xingjia Pan
Zhiliang Peng
Qi Tian
Zhenjun Han
Bolei Zhou
QiXiang Ye
ViT
WSOL
27
198
0
27 Mar 2021
Vision Transformers for Dense Prediction
Vision Transformers for Dense Prediction
René Ranftl
Alexey Bochkovskiy
V. Koltun
ViT
MDE
42
1,659
0
24 Mar 2021
SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
Brevin Tilmon
S. Koppal
MDE
22
5
0
24 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
16
395
0
23 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
26
89
0
22 Mar 2021
DeepViT: Towards Deeper Vision Transformer
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
42
510
0
22 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
47
467
0
22 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive
  Biases
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
46
804
0
19 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
25
126
0
19 Mar 2021
UNETR: Transformers for 3D Medical Image Segmentation
UNETR: Transformers for 3D Medical Image Segmentation
Ali Hatamizadeh
Yucheng Tang
Vishwesh Nath
Dong Yang
Andriy Myronenko
Bennett Landman
H. Roth
Daguang Xu
ViT
MedIm
57
1,533
0
18 Mar 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Tsung-Yi Lin
Jonathon Shlens
Barret Zoph
29
297
0
13 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual
  Recognition
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
19
304
0
10 Mar 2021
Previous
12345
Next