Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00641
Cited By
Focal Self-attention for Local-Global Interactions in Vision Transformers
1 July 2021
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Focal Self-attention for Local-Global Interactions in Vision Transformers"
50 / 259 papers shown
Title
MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision
Zhimin Zhu
Jianguo Zhao
Tong Mu
Yuliang Yang
Mengyu Zhu
27
0
0
08 Apr 2023
Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal
Tao Gao
Yuanbo Wen
Kaihao Zhang
Peng Cheng
Ting Chen
ViT
24
5
0
06 Apr 2023
Vision Transformers with Mixed-Resolution Tokenization
Tomer Ronen
Omer Levy
A. Golbert
ViT
11
21
0
01 Apr 2023
Rethinking Local Perception in Lightweight Vision Transformer
Qi Fan
Huaibo Huang
Jiyang Guan
Ran He
ViT
16
29
0
31 Mar 2023
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu
Pan Zhou
Shuicheng Yan
Xinchao Wang
34
117
0
29 Mar 2023
Spherical Transformer for LiDAR-based 3D Recognition
Xin Lai
Yukang Chen
Fanbin Lu
Jianhui Liu
Jiaya Jia
3DPC
37
125
0
22 Mar 2023
Robustifying Token Attention for Vision Transformers
Yong Guo
David Stutz
Bernt Schiele
ViT
14
24
0
20 Mar 2023
Making Vision Transformers Efficient from A Token Sparsification View
Shuning Chang
Pichao Wang
Ming Lin
Fan Wang
David Junhao Zhang
Rong Jin
Mike Zheng Shou
ViT
43
24
0
15 Mar 2023
Pretrained ViTs Yield Versatile Representations For Medical Images
Christos Matsoukas
Johan Fredin Haslum
Magnus P Soderberg
Kevin Smith
MedIm
ViT
11
11
0
13 Mar 2023
TransMatting: Tri-token Equipped Transformer Model for Image Matting
Huanqia Cai
Fanglei Xue
Lele Xu
Lili Guo
ViT
10
3
0
11 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
199
0
20 Feb 2023
Stitchable Neural Networks
Zizheng Pan
Jianfei Cai
Bohan Zhuang
45
22
0
13 Feb 2023
CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction
Gang Zhang
Zi-Hua Li
Chufeng Tang
Jianmin Li
Xiaolin Hu
24
15
0
13 Feb 2023
Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection
Hang Zhou
Junqing Yu
Wei Yang
14
62
0
10 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
17
136
0
03 Feb 2023
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
64
8
0
31 Jan 2023
A Survey of Advanced Computer Vision Techniques for Sports
Tiago Mendes-Neves
Luís Meireles
João Mendes-Moreira
16
4
0
18 Jan 2023
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection
Shuailei Ma
Yuefeng Wang
Shanze Wang
Ying-yu Wei
24
33
0
08 Jan 2023
Unsupervised 4D LiDAR Moving Object Segmentation in Stationary Settings with Multivariate Occupancy Time Series
T. Kreutz
M. Mühlhäuser
Alejandro Sánchez Guinea
44
13
0
30 Dec 2022
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
17
11
0
23 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
29
13
0
21 Dec 2022
Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation
Mohammadreza Naderi
Mohammad H. Givkashi
F. Piri
N. Karimi
S. Samavi
ViT
MedIm
11
12
0
19 Dec 2022
Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition
Hongxia Xie
Ming-Xian Lee
Tzu-Jui Chen
Hung-Jen Chen
Hou-I Liu
Hong-Han Shuai
Wen-Huang Cheng
CVBM
27
8
0
14 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
X. Wang
ViT
27
21
0
13 Dec 2022
Mitigation of Spatial Nonstationarity with Vision Transformers
Lei Liu
Javier E. Santos
Mavsa Prodanović
Michael J. Pyrcz
10
4
0
09 Dec 2022
Asymmetric Cross-Scale Alignment for Text-Based Person Search
Zhong Ji
Junhua Hu
Deyin Liu
Yuan Wu
Ye Zhao
18
42
0
26 Nov 2022
Cross Aggregation Transformer for Image Restoration
Zheng Chen
Yulun Zhang
Jinjin Gu
Yongbing Zhang
L. Kong
X. Yuan
ViT
31
142
0
24 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
24
6
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
23
129
0
22 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
19
74
0
21 Nov 2022
Vision Transformer with Super Token Sampling
Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
T. Tan
ViT
13
55
0
21 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
19
0
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
25
654
0
10 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
15
49
0
09 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
22
0
0
01 Nov 2022
ViT-LSLA: Vision Transformer with Light Self-Limited-Attention
Zhenzhe Hechen
Wei Huang
Yixin Zhao
ViT
22
6
0
31 Oct 2022
Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images
Yan Zhang
Xiyuan Gao
Qingyan Duan
Jiaxu Leng
Xiao Pu
Xinbo Gao
ViT
16
1
0
28 Oct 2022
Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets
Xiangyu Chen
Ying Qin
Wenju Xu
A. Bur
Cuncong Zhong
Guanghui Wang
ViT
33
3
0
25 Oct 2022
Context-Enhanced Stereo Transformer
Weiyu Guo
Zhaoshuo Li
Yongkui Yang
Z. Wang
Russell H. Taylor
Mathias Unberath
Alan Yuille
Yingwei Li
17
35
0
21 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
10
56
0
12 Oct 2022
Curved Representation Space of Vision Transformers
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
28
6
0
11 Oct 2022
Hierarchical Graph Transformer with Adaptive Node Sampling
Zaixin Zhang
Qi Liu
Qingyong Hu
Cheekong Lee
67
82
0
08 Oct 2022
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images
Chengyin Li
Yao Qiang
Vikram Goddla
H. Bagher-Ebadian
Prashant Khanduri
I. Chetty
D. Zhu
ViT
MedIm
37
9
0
06 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
24
58
0
04 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
17
25
0
03 Oct 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
93
87
0
30 Sep 2022
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
19
16
0
20 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
13
1
0
19 Sep 2022
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
Meng-Hao Guo
Chenggang Lu
Qibin Hou
Zheng Liu
Ming-Ming Cheng
Shiyong Hu
SSeg
ViT
VLM
21
600
0
18 Sep 2022
DMFormer: Closing the Gap Between CNN and Vision Transformers
Zimian Wei
H. Pan
Lujun Li
Menglong Lu
Xin-Yi Niu
Peijie Dong
Dongsheng Li
ViT
48
5
0
16 Sep 2022
Previous
1
2
3
4
5
6
Next