ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04803
  4. Cited By
CoAtNet: Marrying Convolution and Attention for All Data Sizes
v1v2 (latest)

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
    ViT
ArXiv (abs)PDFHTML

Papers citing "CoAtNet: Marrying Convolution and Attention for All Data Sizes"

50 / 510 papers shown
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Kunchang Li
Yali Wang
Junhao Zhang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
532
521
0
24 Jan 2022
NAS-VAD: Neural Architecture Search for Voice Activity Detection
NAS-VAD: Neural Architecture Search for Voice Activity DetectionInterspeech (Interspeech), 2022
Daniel Rho
Jinhyeok Park
J. Ko
271
8
0
22 Jan 2022
Nearest Class-Center Simplification through Intermediate Layers
Nearest Class-Center Simplification through Intermediate Layers
Ido Ben-Shaul
S. Dekel
244
28
0
21 Jan 2022
SwinUNet3D -- A Hierarchical Architecture for Deep Traffic Prediction
  using Shifted Window Transformers
SwinUNet3D -- A Hierarchical Architecture for Deep Traffic Prediction using Shifted Window Transformers
Alabi Bojesomo
Hasan Al Marzouqi
P. Liatsis
ViT
129
6
0
17 Jan 2022
Video Transformers: A Survey
Video Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
450
139
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation LearningInternational Conference on Learning Representations (ICLR), 2022
Kunchang Li
Yali Wang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
440
319
0
12 Jan 2022
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
A. Brown
Jaesung Huh
Joon Son Chung
Arsha Nagrani
Daniel Garcia-Romero
Andrew Zisserman
184
45
0
12 Jan 2022
A ConvNet for the 2020s
A ConvNet for the 2020sComputer Vision and Pattern Recognition (CVPR), 2022
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
585
6,985
0
10 Jan 2022
NeuralMLS: Geometry-Aware Control Point Deformation
NeuralMLS: Geometry-Aware Control Point DeformationEurographics (EG), 2022
Meitar Shechter
Rana Hanocka
G. Metzer
Raja Giryes
Daniel Cohen-Or
166
7
0
05 Jan 2022
Separation of Scales and a Thermodynamic Description of Feature Learning
  in Some CNNs
Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNsNature Communications (Nat Commun), 2021
Inbar Seroussi
Gadi Naveh
Zohar Ringel
347
64
0
31 Dec 2021
CSformer: Bridging Convolution and Transformer for Compressive Sensing
CSformer: Bridging Convolution and Transformer for Compressive SensingIEEE Transactions on Image Processing (TIP), 2021
Dongjie Ye
Zhangkai Ni
Hanli Wang
Jian Zhang
Shiqi Wang
Sam Kwong
ViTMedIm
142
81
0
31 Dec 2021
Augmenting Convolutional networks with attention-based aggregation
Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Piotr Bojanowski
Armand Joulin
Gabriel Synnaeve
Edouard Grave
ViT
190
59
0
27 Dec 2021
Raw Produce Quality Detection with Shifted Window Self-Attention
Raw Produce Quality Detection with Shifted Window Self-Attention
Oh Joon Kwon
Byungsoo Kim
Youngduck Choi
ViT
110
0
0
24 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
238
44
0
23 Dec 2021
Learned Queries for Efficient Local Attention
Learned Queries for Efficient Local AttentionComputer Vision and Pattern Recognition (CVPR), 2021
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
223
36
0
21 Dec 2021
Lite Vision Transformer with Enhanced Self-Attention
Lite Vision Transformer with Enhanced Self-AttentionComputer Vision and Pattern Recognition (CVPR), 2021
Chenglin Yang
Yilin Wang
Jianming Zhang
Chentao Song
Zijun Wei
Zhe Lin
Alan Yuille
ViT
232
148
0
20 Dec 2021
StyleSwin: Transformer-based GAN for High-resolution Image Generation
StyleSwin: Transformer-based GAN for High-resolution Image GenerationComputer Vision and Pattern Recognition (CVPR), 2021
Bo Zhang
Shuyang Gu
Bo Zhang
Jianmin Bao
Dong Chen
Fang Wen
Yong Wang
B. Guo
ViT
450
293
0
20 Dec 2021
E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
  Recognition
E2^22(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
239
69
0
07 Dec 2021
GETAM: Gradient-weighted Element-wise Transformer Attention Map for
  Weakly-supervised Semantic segmentation
GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation
Weixuan Sun
Jing Zhang
Zheyuan Liu
Yiran Zhong
Nick Barnes
ViT
219
15
0
06 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
483
835
0
02 Dec 2021
Residual Pathway Priors for Soft Equivariance Constraints
Residual Pathway Priors for Soft Equivariance Constraints
Marc Finzi
Gregory W. Benton
A. Wilson
BDLUQCV
194
73
0
02 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
276
450
0
02 Dec 2021
MultiPath++: Efficient Information Fusion and Trajectory Aggregation for
  Behavior Prediction
MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction
Balakrishnan Varadarajan
Ahmed S. Hefny
A. Srivastava
Khaled S. Refaat
Nigamaa Nayakanti
...
K. Chen
B. Douillard
C. Lam
Drago Anguelov
Benjamin Sapp
511
375
0
29 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
199
35
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
391
1,049
0
22 Nov 2021
XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For
  Convolutional Neural Networks
XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural NetworksJournal of Intelligent and Robotic Systems (JIRS), 2021
Jian Sun
A. P. Fard
Mohammad H. Mahoor
3DPC
267
8
0
21 Nov 2021
Combined Scaling for Zero-shot Transfer Learning
Combined Scaling for Zero-shot Transfer Learning
Hieu H. Pham
Zihang Dai
Golnaz Ghiasi
Kenji Kawaguchi
Hanxiao Liu
...
Yi-Ting Chen
Minh-Thang Luong
Yonghui Wu
Mingxing Tan
Quoc V. Le
VLM
387
229
0
19 Nov 2021
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Jingdong Sun
Han Hu
422
1,637
0
18 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
534
2,396
0
18 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
234
39
0
16 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text TuningComputer Vision and Pattern Recognition (CVPR), 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
636
666
0
15 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User
  Representations
Scaling Law for Recommendation Models: Towards General-purpose User RepresentationsAAAI Conference on Artificial Intelligence (AAAI), 2021
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
Seon Gyeom Kim
ELM
457
52
0
15 Nov 2021
Attention Mechanisms in Computer Vision: A Survey
Attention Mechanisms in Computer Vision: A SurveyComputational Visual Media (CVM), 2021
Meng-Hao Guo
Tianhan Xu
Jiangjiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph Robert Martin
Ming-Ming Cheng
Shimin Hu
286
2,082
0
15 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression
  Recognition
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
300
62
0
14 Nov 2021
A Survey of Visual Transformers
A Survey of Visual TransformersIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Peng Wang
Jianping Fan
Zhiqiang He
3DGSViT
457
471
0
11 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Hai-Tao Zheng
Li Tao
Dun Liang
Haitao Zheng
622
112
0
07 Nov 2021
Grafting Transformer on Automatically Designed Convolutional Neural
  Network for Hyperspectral Image Classification
Grafting Transformer on Automatically Designed Convolutional Neural Network for Hyperspectral Image ClassificationIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021
Xizhe Xue
Haokui Zhang
Bei Fang
Zongwen Bai
Ying Li
ViT
297
31
0
21 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
312
298
0
18 Oct 2021
StARformer: Transformer with State-Action-Reward Representations for
  Visual Reinforcement Learning
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement LearningEuropean Conference on Computer Vision (ECCV), 2021
Jinghuan Shang
Kumara Kahatapitiya
Xiang Li
Michael S. Ryoo
OffRL
397
41
0
12 Oct 2021
Adversarial Token Attacks on Vision Transformers
Adversarial Token Attacks on Vision Transformers
Ameya Joshi
Gauri Jagatap
Chinmay Hegde
ViT
183
22
0
08 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and
  MLP
UniNet: Unified Architecture Search with Convolution, Transformer, and MLPEuropean Conference on Computer Vision (ECCV), 2021
Jihao Liu
Jiaming Song
Guanglu Song
Xin Huang
Yu Liu
ViT
214
38
0
08 Oct 2021
SIRe-Networks: Convolutional Neural Networks Architectural Extension for
  Information Preservation via Skip/Residual Connections and Interlaced
  Auto-Encoders
SIRe-Networks: Convolutional Neural Networks Architectural Extension for Information Preservation via Skip/Residual Connections and Interlaced Auto-Encoders
D. Avola
Luigi Cinque
Alessio Fagioli
G. Foresti
259
4
0
06 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in
  Generalization
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
269
44
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
644
1,874
0
05 Oct 2021
OH-Former: Omni-Relational High-Order Transformer for Person
  Re-Identification
OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification
Xianing Chen
Chunlin Xu
Qiong Cao
Jialang Xu
Yujie Zhong
Jiale Xu
Zhengxin Li
Jingya Wang
Shenghua Gao
ViT
186
20
0
23 Sep 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
199
5
0
18 Sep 2021
Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker
  Recognition Challenge 2021
Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021
Li Zhang
Huan Zhao
Qinling Meng
Yanli Chen
Min Liu
Lei Xie
217
11
0
08 Sep 2021
Design and Scaffolded Training of an Efficient DNN Operator for Computer
  Vision on the Edge
Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the EdgeACM Transactions on Embedded Computing Systems (TECS), 2021
Vinod Ganesan
Pratyush Kumar
234
2
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
723
910
0
24 Aug 2021
Monte Carlo DropBlock for Modelling Uncertainty in Object Detection
Monte Carlo DropBlock for Modelling Uncertainty in Object DetectionPattern Recognition (Pattern Recogn.), 2021
K. Deepshikha
Sai Harsha Yelleni
P. K. Srijith
C.Krishna Mohan
BDLUQCV
155
107
0
08 Aug 2021
Previous
123...10119
Next