ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00641
  4. Cited By
Focal Self-attention for Local-Global Interactions in Vision
  Transformers

Focal Self-attention for Local-Global Interactions in Vision Transformers

1 July 2021
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
    ViT
ArXiv (abs)PDFHTML

Papers citing "Focal Self-attention for Local-Global Interactions in Vision Transformers"

50 / 263 papers shown
Context-Enhanced Stereo Transformer
Context-Enhanced Stereo TransformerEuropean Conference on Computer Vision (ECCV), 2022
Weiyu Guo
Zhaoshuo Li
Yongkui Yang
Liang Luo
Russell H. Taylor
Mathias Unberath
Alan Yuille
Yingwei Li
171
50
0
21 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural
  Networks on Small Datasets
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small DatasetsNeural Information Processing Systems (NeurIPS), 2022
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
258
84
0
12 Oct 2022
Curved Representation Space of Vision Transformers
Curved Representation Space of Vision TransformersAAAI Conference on Artificial Intelligence (AAAI), 2022
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
281
9
0
11 Oct 2022
Hierarchical Graph Transformer with Adaptive Node Sampling
Hierarchical Graph Transformer with Adaptive Node SamplingNeural Information Processing Systems (NeurIPS), 2022
Zaixin Zhang
Qi Liu
Qingyong Hu
Cheekong Lee
311
122
0
08 Oct 2022
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT
  Images
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT ImagesInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Chengyin Li
Yao Qiang
Vikram Goddla
H. Bagher-Ebadian
Prashant Khanduri
I. Chetty
D. Zhu
ViTMedIm
146
16
0
06 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
  Models
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision ModelsInternational Conference on Learning Representations (ICLR), 2022
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViTMoE
320
78
0
04 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
256
39
0
03 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
295
139
0
30 Sep 2022
Graph Reasoning Transformer for Image Parsing
Graph Reasoning Transformer for Image ParsingACM Multimedia (ACM MM), 2022
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
140
20
0
20 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision
  Transformers
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
146
1
0
19 Sep 2022
SegNeXt: Rethinking Convolutional Attention Design for Semantic
  Segmentation
SegNeXt: Rethinking Convolutional Attention Design for Semantic SegmentationNeural Information Processing Systems (NeurIPS), 2022
Meng-Hao Guo
Chenggang Lu
Qibin Hou
Zheng Liu
Ming-Ming Cheng
Shiyong Hu
SSegViTVLM
318
981
0
18 Sep 2022
DMFormer: Closing the Gap Between CNN and Vision Transformers
DMFormer: Closing the Gap Between CNN and Vision TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zimian Wei
H. Pan
Lujun Li
Menglong Lu
Xin-Yi Niu
Peijie Dong
Dongsheng Li
ViT
317
7
0
16 Sep 2022
CenterFormer: Center-based Transformer for 3D Object Detection
CenterFormer: Center-based Transformer for 3D Object DetectionEuropean Conference on Computer Vision (ECCV), 2022
Zixiang Zhou
Xian Zhao
Yu Wang
Panqu Wang
H. Foroosh
3DPCViT
187
180
0
12 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for
  Visual Recognition
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual RecognitionNeurocomputing (Neurocomputing), 2022
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
159
21
0
31 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image PretrainingComputer Vision and Pattern Recognition (CVPR), 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIPVLM
281
221
0
25 Aug 2022
Efficient Attention-free Video Shift Transformers
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
211
1
0
23 Aug 2022
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze
  Estimation
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze EstimationBritish Machine Vision Conference (BMVC), 2022
Bolin Lai
Miao Liu
Fiona Ryan
James M. Rehg
ViT
236
50
0
08 Aug 2022
TransMatting: Enhancing Transparent Objects Matting with Transformers
TransMatting: Enhancing Transparent Objects Matting with TransformersEuropean Conference on Computer Vision (ECCV), 2022
Huanqia Cai
Fanglei Xue
Lele Xu
Lili Guo
ViT
164
30
0
05 Aug 2022
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object
  Detection
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zhipeng Luo
Gongjie Zhang
Changqing Zhou
Ti Liu
Shijian Lu
Liang Pan
3DPCViT
208
11
0
04 Aug 2022
giMLPs: Gate with Inhibition Mechanism in MLPs
Cheng Kang
Jindich Prokop
Lei Tong
Huiyu Zhou
Yong Hu
Daneil Novak
163
0
0
01 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
Global-Local Self-Distillation for Visual Representation LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tim Lebailly
Tinne Tuytelaars
SSL
121
6
0
29 Jul 2022
COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram
  Transformers
COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram TransformersIEEE journal of biomedical and health informatics (IEEE JBHI), 2022
Idil Aytekin
Onat Dalmaz
Kaan Gonc
H. Ankishan
E. Saritas
Ulas Bagci
H. Celik
Tolga Çukur
171
20
0
19 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System
  Forecasting
Earthformer: Exploring Space-Time Transformers for Earth System ForecastingNeural Information Processing Systems (NeurIPS), 2022
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
315
245
0
12 Jul 2022
LightViT: Towards Light-Weight Convolution-Free Vision Transformers
LightViT: Towards Light-Weight Convolution-Free Vision Transformers
Tao Huang
Lang Huang
Shan You
Fei Wang
Chao Qian
Chang Xu
ViT
182
76
0
12 Jul 2022
Compound Prototype Matching for Few-shot Action Recognition
Compound Prototype Matching for Few-shot Action RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Yifei Huang
Lijin Yang
Yoichi Sato
361
59
0
12 Jul 2022
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
  Learning
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation LearningEuropean Conference on Computer Vision (ECCV), 2022
Ting Yao
Yingwei Pan
Yehao Li
Chong-Wah Ngo
Tao Mei
ViT
462
192
0
11 Jul 2022
Dual Vision Transformer
Dual Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
358
112
0
11 Jul 2022
Self-attention on Multi-Shifted Windows for Scene Segmentation
Self-attention on Multi-Shifted Windows for Scene Segmentation
Litao Yu
Zhibin Li
Jian Zhang
Qiang Wu
SSeg
155
1
0
10 Jul 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
  Transformers
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse TransformersConference on Robot Learning (CoRL), 2022
Runsheng Xu
Zhengzhong Tu
Hao Xiang
Wei Shao
Bolei Zhou
Jiaqi Ma
416
307
0
05 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical
  Inter-Level Attention
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
190
3
0
05 Jul 2022
Polarized Color Image Denoising using Pocoformer
Zhuoxiao Li
Hai-bo Jiang
Yinqiang Zheng
219
4
0
01 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
210
0
0
01 Jul 2022
Deformable Graph Transformer
Deformable Graph TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jinyoung Park
Seongjun Yun
Hyeon-ju Park
Jaewoo Kang
Jisu Jeong
KyungHyun Kim
Jung-Woo Ha
Hyunwoo J. Kim
244
11
0
29 Jun 2022
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNsComputer Vision and Pattern Recognition (CVPR), 2022
Yukang Chen
Jianhui Liu
Xinming Zhang
Xiaojuan Qi
Jiaya Jia
244
121
0
21 Jun 2022
Vicinity Vision Transformer
Vicinity Vision TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weixuan Sun
Zhen Qin
Huiyuan Deng
Jianyuan Wang
Yi Zhang
Kaihao Zhang
Nick Barnes
Stan Birchfield
Lingpeng Kong
Yiran Zhong
ViT
206
45
0
21 Jun 2022
Learning Multiscale Transformer Models for Sequence Generation
Learning Multiscale Transformer Models for Sequence GenerationInternational Conference on Machine Learning (ICML), 2022
Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao
Jingbo Zhu
202
13
0
19 Jun 2022
Efficient Decoder-free Object Detection with Transformers
Efficient Decoder-free Object Detection with TransformersEuropean Conference on Computer Vision (ECCV), 2022
Peixian Chen
Mengdan Zhang
Chunjiang Ge
Kekai Sheng
Yuting Gao
Xing Sun
Ke Li
Chunhua Shen
ViT
273
20
0
14 Jun 2022
Peripheral Vision Transformer
Peripheral Vision TransformerNeural Information Processing Systems (NeurIPS), 2022
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViTMDE
238
35
0
14 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
338
40
0
01 Jun 2022
Self-Supervised Pre-training of Vision Transformers for Dense Prediction
  Tasks
Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks
Jaonary Rabarisoa
Velentin Belissen
Florian Chabot
Q. C. Pham
VLMViTSSLMDE
115
3
0
30 May 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Jingdong Sun
Lingxi Xie
Qi Tian
249
39
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence LearningNeural Information Processing Systems (NeurIPS), 2022
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
450
21
0
30 May 2022
Fast Vision Transformers with HiLo Attention
Fast Vision Transformers with HiLo AttentionNeural Information Processing Systems (NeurIPS), 2022
Zizheng Pan
Jianfei Cai
Bohan Zhuang
444
242
0
26 May 2022
Inception Transformer
Inception TransformerNeural Information Processing Systems (NeurIPS), 2022
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
337
256
0
25 May 2022
ASSET: Autoregressive Semantic Scene Editing with Transformers at High
  Resolutions
ASSET: Autoregressive Semantic Scene Editing with Transformers at High ResolutionsACM Transactions on Graphics (TOG), 2022
Difan Liu
Sandesh Shetty
Tobias Hinz
Matthew Fisher
Richard Y. Zhang
Taesung Park
E. Kalogerakis
ViT
193
42
0
24 May 2022
BolT: Fused Window Transformers for fMRI Time Series Analysis
BolT: Fused Window Transformers for fMRI Time Series Analysis
H. Bedel
Irmak Sivgin
Onat Dalmaz
S. Dar
Tolga Çukur
353
92
0
23 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
299
86
0
20 May 2022
Vision Transformer Adapter for Dense Predictions
Vision Transformer Adapter for Dense PredictionsInternational Conference on Learning Representations (ICLR), 2022
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
878
755
0
17 May 2022
MulT: An End-to-End Multitask Learning Transformer
MulT: An End-to-End Multitask Learning TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Deblina Bhattacharjee
Tong Zhang
Sabine Süsstrunk
Mathieu Salzmann
ViT
230
88
0
17 May 2022
Transformers in 3D Point Clouds: A Survey
Transformers in 3D Point Clouds: A Survey
Dening Lu
Qian Xie
Mingqiang Wei
Kyle Gao
Linlin Xu
Jonathan Li
3DPCViT
323
65
0
16 May 2022
Previous
123456
Next