ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00641
  4. Cited By
Focal Self-attention for Local-Global Interactions in Vision
  Transformers

Focal Self-attention for Local-Global Interactions in Vision Transformers

1 July 2021
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
    ViT
ArXivPDFHTML

Papers citing "Focal Self-attention for Local-Global Interactions in Vision Transformers"

50 / 259 papers shown
Title
RSTT: Real-time Spatial Temporal Transformer for Space-Time Video
  Super-Resolution
RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution
Z. Geng
Luming Liang
Tianyu Ding
Ilya Zharkov
17
68
0
27 Mar 2022
Towards Exemplar-Free Continual Learning in Vision Transformers: an
  Account of Attention, Functional and Weight Regularization
Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization
Francesco Pelosin
Saurav Jha
A. Torsello
Bogdan Raducanu
Joost van de Weijer
CLL
13
28
0
24 Mar 2022
Beyond Fixation: Dynamic Window Visual Transformer
Beyond Fixation: Dynamic Window Visual Transformer
Pengzhen Ren
Changlin Li
Guangrun Wang
Yun Xiao
Qing Du
Xiaodan Liang
Qing Du Xiaodan Liang Xiaojun Chang
ViT
18
32
0
24 Mar 2022
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
Ryan Grainger
Thomas Paniagua
Xi Song
Naresh P. Cuntoor
Mun Wai Lee
Tianfu Wu
ViT
10
7
0
22 Mar 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
22
263
0
22 Mar 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
  Detection and Text Recognition
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
11
98
0
19 Mar 2022
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
Y. Fu
Shunyao Zhang
Shan-Hung Wu
Cheng Wan
Yingyan Lin
AAML
23
64
0
16 Mar 2022
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene
  Understanding
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Hanrong Ye
Dan Xu
ViT
11
84
0
15 Mar 2022
Active Token Mixer
Active Token Mixer
Guoqiang Wei
Zhizheng Zhang
Cuiling Lan
Yan Lu
Zhibo Chen
10
15
0
11 Mar 2022
Lane Detection with Versatile AtrousFormer and Local Semantic Guidance
Lane Detection with Versatile AtrousFormer and Local Semantic Guidance
Jiaxing Yang
Lihe Zhang
Huchuan Lu
ViT
8
19
0
08 Mar 2022
Boosting Crowd Counting via Multifaceted Attention
Boosting Crowd Counting via Multifaceted Attention
Hui Lin
Zhiheng Ma
Rongrong Ji
Yaowei Wang
Xiaopeng Hong
23
145
0
05 Mar 2022
A Unified Query-based Paradigm for Point Cloud Understanding
A Unified Query-based Paradigm for Point Cloud Understanding
Zetong Yang
Li Jiang
Yanan Sun
Bernt Schiele
Jiaya Jia
3DPC
19
38
0
02 Mar 2022
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for
  Visual Discrimination
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination
Qingsong Zhao
Shuguang Dou
Zhipeng Zhou
Yangguang Li
Yin Wang
Yu Qiao
Cairong Zhao
10
3
0
21 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
14
229
0
21 Feb 2022
Visual Attention Network
Visual Attention Network
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViT
VLM
17
635
0
20 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
23
328
0
16 Feb 2022
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
Seokju Cho
Sunghwan Hong
Seung Wook Kim
ViT
19
34
0
14 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
361
0
24 Jan 2022
Dual-Flattening Transformers through Decomposed Row and Column Queries
  for Semantic Segmentation
Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation
Ying Wang
C. Ho
Wenju Xu
Ziwei Xuan
Xudong Liu
Guo-Jun Qi
ViT
14
5
0
22 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
157
156
0
08 Jan 2022
Vision Transformer with Deformable Attention
Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
S. Song
Li Erran Li
Gao Huang
ViT
22
452
0
03 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped
  Attention
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
23
70
0
28 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
21
37
0
23 Dec 2021
Learned Queries for Efficient Local Attention
Learned Queries for Efficient Local Attention
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
36
29
0
21 Dec 2021
Contrastive Object Detection Using Knowledge Graph Embeddings
Contrastive Object Detection Using Knowledge Graph Embeddings
Christopher Lang
Alexander Braun
Abhinav Valada
8
8
0
21 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
13
243
0
21 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
21
21
0
09 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Jenq-Neng Hwang
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
24
1,013
0
07 Dec 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
25
222
0
30 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
20
6
0
26 Nov 2021
NomMer: Nominate Synergistic Context in Vision Transformer for Visual
  Recognition
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Hao Liu
Xinghua Jiang
Xin Li
Zhimin Bao
Deqiang Jiang
Bo Ren
ViT
22
16
0
25 Nov 2021
Self-slimmed Vision Transformer
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
16
30
0
24 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
19
30
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
14
63
0
23 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
24
875
0
22 Nov 2021
Rethinking Query, Key, and Value Embedding in Vision Transformer under
  Tiny Model Constraints
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
Jaesin Ahn
Jiuk Hong
Jeongwoo Ju
Heechul Jung
ViT
19
3
0
19 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
41
1,738
0
18 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
66
330
0
11 Nov 2021
StARformer: Transformer with State-Action-Reward Representations for
  Visual Reinforcement Learning
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang
Kumara Kahatapitiya
Xiang Li
Michael S. Ryoo
OffRL
30
33
0
12 Oct 2021
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
Chao-Hong Tan
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Zhenhua Ling
ViT
17
11
0
06 Oct 2021
LGD: Label-guided Self-distillation for Object Detection
LGD: Label-guided Self-distillation for Object Detection
Peizhen Zhang
Zijian Kang
Tong Yang
X. Zhang
N. Zheng
Jian-jun Sun
ObjD
92
30
0
23 Sep 2021
Complementary Feature Enhanced Network with Vision Transformer for Image
  Dehazing
Complementary Feature Enhanced Network with Vision Transformer for Image Dehazing
Dong Zhao
Jia Li
Hongyu Li
Longhao Xu
ViT
11
16
0
15 Sep 2021
S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
S2^22-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
37
50
0
02 Aug 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
41
312
0
24 Jun 2021
More than Encoder: Introducing Transformer Decoder to Upsample
More than Encoder: Introducing Transformer Decoder to Upsample
Yijiang Li
Wentian Cai
Ying Gao
Chengming Li
Xiping Hu
ViT
MedIm
19
49
0
20 Jun 2021
Uformer: A General U-Shaped Transformer for Image Restoration
Uformer: A General U-Shaped Transformer for Image Restoration
Zhendong Wang
Xiaodong Cun
Jianmin Bao
Wengang Zhou
Jianzhuang Liu
Houqiang Li
ViT
34
1,356
0
06 Jun 2021
KVT: k-NN Attention for Boosting Vision Transformers
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
32
105
0
28 May 2021
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial
  Transformers
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers
Yilmaz Korkmaz
S. Dar
Mahmut Yurt
Muzaffer Özbey
Tolga Çukur
ViT
MedIm
11
189
0
15 May 2021
A State-of-the-art Survey of Object Detection Techniques in
  Microorganism Image Analysis: From Classical Methods to Deep Learning
  Approaches
A State-of-the-art Survey of Object Detection Techniques in Microorganism Image Analysis: From Classical Methods to Deep Learning Approaches
Pingli Ma
Chen Li
M. Rahaman
Yudong Yao
Jiawei Zhang
Shuojia Zou
Xin Zhao
M. Grzegorzek
24
60
0
07 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
Previous
123456
Next