ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.13621
  4. Cited By
Exploring Self-attention for Image Recognition

Exploring Self-attention for Image Recognition

28 April 2020
Hengshuang Zhao
Jiaya Jia
V. Koltun
    SSL
ArXivPDFHTML

Papers citing "Exploring Self-attention for Image Recognition"

50 / 316 papers shown
Title
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
248
577
0
22 Apr 2021
Variational Relational Point Completion Network
Variational Relational Point Completion Network
Liang Pan
Xinyi Chen
Zhongang Cai
Junzhe Zhang
Haiyu Zhao
Shuai Yi
Ziwei Liu
3DPC
195
176
0
20 Apr 2021
HoughNet: Integrating near and long-range evidence for visual detection
HoughNet: Integrating near and long-range evidence for visual detection
Nermin Samet
Samet Hicsonmez
Emre Akbas
ObjD
21
10
0
14 Apr 2021
Co-Scale Conv-Attentional Image Transformers
Co-Scale Conv-Attentional Image Transformers
Weijian Xu
Yifan Xu
Tyler A. Chang
Z. Tu
ViT
11
373
0
13 Apr 2021
GAttANet: Global attention agreement for convolutional neural networks
GAttANet: Global attention agreement for convolutional neural networks
R. V. Rullen
A. Alamia
ViT
13
2
0
12 Apr 2021
Fine-Grained Attention for Weakly Supervised Object Localization
Fine-Grained Attention for Weakly Supervised Object Localization
Junghyo Sohn
Eunjin Jeon
Wonsik Jung
Eunsong Kang
Heung-Il Suk
WSOL
16
3
0
11 Apr 2021
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
  View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
Varun Ravi Kumar
Marvin Klingner
S. Yogamani
Markus Bach
Stefan Milz
Tim Fingscheidt
Patrick Mäder
MDE
48
37
0
09 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Capturing Multi-Resolution Context by Dilated Self-Attention
Niko Moritz
Takaaki Hori
Jonathan Le Roux
11
7
0
07 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
37
1,801
0
05 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Hervé Jégou
Matthijs Douze
ViT
11
768
0
02 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
8
26
0
02 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
25
986
0
31 Mar 2021
Dual Contrastive Loss and Attention for GANs
Dual Contrastive Loss and Attention for GANs
Ning Yu
Guilin Liu
Aysegül Dündar
Andrew Tao
Bryan Catanzaro
Larry S. Davis
Mario Fritz
GAN
22
60
0
31 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
ViT
28
1,420
0
27 Mar 2021
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
  Object Localization
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
Wei Gao
Fang Wan
Xingjia Pan
Zhiliang Peng
Qi Tian
Zhenjun Han
Bolei Zhou
QiXiang Ye
ViT
WSOL
12
198
0
27 Mar 2021
Understanding Robustness of Transformers for Image Classification
Understanding Robustness of Transformers for Image Classification
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
14
378
0
26 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
ViT
124
20,677
0
25 Mar 2021
Vision Transformers for Dense Prediction
Vision Transformers for Dense Prediction
René Ranftl
Alexey Bochkovskiy
V. Koltun
ViT
MDE
36
1,659
0
24 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
16
395
0
23 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
21
89
0
22 Mar 2021
DeepViT: Towards Deeper Vision Transformer
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
42
510
0
22 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
24
467
0
22 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual
  Recognition
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
15
304
0
10 Mar 2021
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
Yinan He
Bei Gan
Siyu Chen
Yichun Zhou
Guojun Yin
Luchuan Song
Lu Sheng
Jing Shao
Ziwei Liu
AAML
24
129
0
09 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
48
973
0
04 Mar 2021
Generative Adversarial Transformers
Generative Adversarial Transformers
Drew A. Hudson
C. L. Zitnick
ViT
23
179
0
01 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
274
3,622
0
24 Feb 2021
Model-Attentive Ensemble Learning for Sequence Modeling
Model-Attentive Ensemble Learning for Sequence Modeling
Victor D. Bourgin
Ioana Bica
M. Schaar
AI4TS
15
0
0
23 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
14
295
0
22 Feb 2021
Hard-Attention for Scalable Image Classification
Hard-Attention for Scalable Image Classification
Athanasios Papadopoulos
Pawel Korus
N. Memon
62
25
0
20 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
267
179
0
17 Feb 2021
OmniDet: Surround View Cameras based Multi-task Visual Perception
  Network for Autonomous Driving
OmniDet: Surround View Cameras based Multi-task Visual Perception Network for Autonomous Driving
Varun Ravi Kumar
S. Yogamani
Hazem Rashed
Ganesh Sitsu
Christian Witt
Isabelle Leang
Stefan Milz
Patrick Mäder
23
90
0
15 Feb 2021
Learning Self-Similarity in Space and Time as Generalized Motion for
  Video Action Recognition
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
TTA
19
39
0
14 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,981
0
09 Feb 2021
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
6
1,904
0
28 Jan 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
979
0
27 Jan 2021
Shape or Texture: Understanding Discriminative Features in CNNs
Shape or Texture: Understanding Discriminative Features in CNNs
Md. Amirul Islam
M. Kowal
Patrick Esser
Sen Jia
Bjorn Ommer
Konstantinos G. Derpanis
Neil D. B. Bruce
14
75
0
27 Jan 2021
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature
  Magnitude Learning
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Yu Tian
Guansong Pang
Yuanhong Chen
Rajvinder Singh
Johan W. Verjans
G. Carneiro
AI4TS
13
291
0
25 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
14
165
0
21 Jan 2021
Context-aware Attentional Pooling (CAP) for Fine-grained Visual
  Classification
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification
Ardhendu Behera
Zachary Wharton
Pradeep Ruwan Padmasiri Galbokka Hewage
Asish Bera
59
108
0
17 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,428
0
04 Jan 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
  with Transformers
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip H. S. Torr
Li Zhang
ViT
17
2,837
0
31 Dec 2020
Attention-based Image Upsampling
Attention-based Image Upsampling
Souvik Kundu
Hesham Mostafa
S. N. Sridhar
Sairam Sundaresan
SupR
11
10
0
17 Dec 2020
Point Transformer
Point Transformer
Hengshuang Zhao
Li Jiang
Jiaya Jia
Philip H. S. Torr
V. Koltun
3DPC
ViT
25
11
0
16 Dec 2020
Responsible Disclosure of Generative Models Using Scalable
  Fingerprinting
Responsible Disclosure of Generative Models Using Scalable Fingerprinting
Ning Yu
Vladislav Skripniuk
Dingfan Chen
Larry S. Davis
Mario Fritz
WIGM
35
89
0
16 Dec 2020
Fine-grained Angular Contrastive Learning with Coarse Labels
Fine-grained Angular Contrastive Learning with Coarse Labels
Guy Bukchin
Eli Schwartz
Kate Saenko
Ori Shahar
Rogerio Feris
Raja Giryes
Leonid Karlinsky
27
52
0
07 Dec 2020
Deep Learning and the Global Workspace Theory
Deep Learning and the Global Workspace Theory
R. V. Rullen
Ryota Kanai
37
65
0
04 Dec 2020
Pre-Trained Image Processing Transformer
Pre-Trained Image Processing Transformer
Hanting Chen
Yunhe Wang
Tianyu Guo
Chang Xu
Yiping Deng
Zhenhua Liu
Siwei Ma
Chunjing Xu
Chao Xu
Wen Gao
VLM
ViT
37
1,632
0
01 Dec 2020
Deeper or Wider Networks of Point Clouds with Self-attention?
Haoxi Ran
Li Lu
3DPC
19
1
0
29 Nov 2020
Reflective-Net: Learning from Explanations
Reflective-Net: Learning from Explanations
Johannes Schneider
Michalis Vlachos
FAtt
OffRL
LRM
52
18
0
27 Nov 2020
Previous
1234567
Next