Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.04803
Cited By
v1
v2 (latest)
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CoAtNet: Marrying Convolution and Attention for All Data Sizes"
50 / 510 papers shown
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Computer Vision and Pattern Recognition (CVPR), 2022
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
215
49
0
21 Nov 2022
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViT
MedIm
257
56
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Computer Vision and Pattern Recognition (CVPR), 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
244
56
0
17 Nov 2022
AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training
Computer Vision and Pattern Recognition (CVPR), 2022
Lezhi Li
Peter Hedman
B. Mildenhall
Dejia Xu
Jonathan T. Barron
Zinan Lin
Tianfan Xue
AI4CE
203
45
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
607
898
0
14 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
IEEE International Conference on Computer Vision (ICCV), 2022
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
276
8
0
14 Nov 2022
BiViT: Extremely Compressed Binary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2022
Yefei He
Zhenyu Lou
Luoming Zhang
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang
ViT
MQ
266
41
0
14 Nov 2022
A Comprehensive Survey of Transformers for Computer Vision
Sonain Jamil
Md. Jalil Piran
Oh-Jin Kwon
ViT
135
85
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Computer Vision and Pattern Recognition (CVPR), 2022
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Jiaming Song
Xiaogang Wang
Yu Qiao
VLM
554
958
0
10 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Botian Shi
Xiao-hua Hu
ViT
282
11
0
10 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
International Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
124
0
07 Nov 2022
SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers
ACM Multimedia Asia (MA), 2022
Alessandro Arezzo
Stefano Berretti
ViT
109
20
0
04 Nov 2022
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Jiehua Zhang
Xueyang Zhang
Z. Su
Zitong Yu
Yanghe Feng
Xin Lu
M. Pietikäinen
Li Liu
MQ
256
0
0
04 Nov 2022
Exploring Effects of Computational Parameter Changes to Image Recognition Systems
Nikolaos Louloudakis
Perry Gibson
José Cano
A. Rajan
217
6
0
01 Nov 2022
Accelerating Certified Robustness Training via Knowledge Transfer
Neural Information Processing Systems (NeurIPS), 2022
Pratik Vaishnavi
Kevin Eykholt
Amir Rahmati
202
8
0
25 Oct 2022
The Curious Case of Benign Memorization
International Conference on Learning Representations (ICLR), 2022
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
348
12
0
25 Oct 2022
DialogConv: A Lightweight Fully Convolutional Network for Multi-view Response Selection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yongkang Liu
Shi Feng
Wei Gao
Daling Wang
Yifei Zhang
149
4
0
25 Oct 2022
Synthetic Data Supervised Salient Object Detection
ACM Multimedia (ACM MM), 2022
Zhenyu Wu
Lin Wang
Wei Wang
Tengfei Shi
Chenglizhao Chen
Aimin Hao
Shuo Li
176
30
0
25 Oct 2022
MetaFormer Baselines for Vision
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
245
270
0
24 Oct 2022
Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing
Alireza Azadbakht
Saeed Reza Kheradpisheh
Ismail Khalfaoui-Hassani
T. Masquelier
161
1
0
23 Oct 2022
Similarity of Neural Architectures using Adversarial Attack Transferability
European Conference on Computer Vision (ECCV), 2022
Ian Ryu
Dongyoon Han
Byeongho Heo
Song Park
Sanghyuk Chun
Jong-Seok Lee
AAML
538
3
0
20 Oct 2022
A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture
ACM Computing Surveys (ACM CSUR), 2022
Jiayun Luo
Boyang Albert Li
Cyril Leung
374
23
0
20 Oct 2022
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
Neural Information Processing Systems (NeurIPS), 2022
Dongze Lian
Daquan Zhou
Jiashi Feng
Xinchao Wang
352
335
0
17 Oct 2022
SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
European Conference on Computer Vision (ECCV), 2022
Pei Sun
Mingxing Tan
Weiyue Wang
Chenxi Liu
Fei Xia
Zhaoqi Leng
Drago Anguelov
ViT
255
153
0
13 Oct 2022
Vision Transformers provably learn spatial structure
Neural Information Processing Systems (NeurIPS), 2022
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
223
101
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Journal of machine learning research (JMLR), 2022
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
317
64
0
13 Oct 2022
Fast-ParC: Capturing Position Aware Global Feature for ConvNets and ViTs
Taojiannan Yang
Haokui Zhang
Wenze Hu
Chen Chen
Xiaoyu Wang
ViT
197
0
0
08 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Findings (Findings), 2022
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
320
38
0
07 Oct 2022
The Lie Derivative for Measuring Learned Equivariance
International Conference on Learning Representations (ICLR), 2022
Nate Gruver
Marc Finzi
Micah Goldblum
A. Wilson
284
52
0
06 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
International Conference on Learning Representations (ICLR), 2022
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
323
78
0
04 Oct 2022
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
Yunsung Lee
Gyuseong Lee
Kwang-seok Ryoo
Hyojun Go
Jihye Park
Seung Wook Kim
136
5
0
04 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
259
39
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need more guidance
British Machine Vision Conference (BMVC), 2022
Kai Wang
Fei Yang
Joost van de Weijer
ViT
162
21
0
03 Oct 2022
An In-depth Study of Stochastic Backpropagation
Neural Information Processing Systems (NeurIPS), 2022
J. Fang
Ming Xu
Hao Chen
Bing Shuai
Zhuowen Tu
Joseph Tighe
BDL
160
2
0
30 Sep 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Spoken Language Technology Workshop (SLT), 2022
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
403
157
0
30 Sep 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
299
142
0
30 Sep 2022
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Computer Vision and Pattern Recognition (CVPR), 2022
Aishan Liu
Shiyu Tang
Yaning Tan
Yazhe Niu
Boxi Wu
Xianglong Liu
Dacheng Tao
AAML
231
23
0
28 Sep 2022
Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper
Sam Goree
G. Appleby
David J. Crandall
Norman Su
266
2
0
22 Sep 2022
Mega: Moving Average Equipped Gated Attention
International Conference on Learning Representations (ICLR), 2022
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
332
217
0
21 Sep 2022
Axially Expanded Windows for Local-Global Interaction in Vision Transformers
Zhemin Zhang
Xun Gong
ViT
146
1
0
19 Sep 2022
VINet: Visual and Inertial-based Terrain Classification and Adaptive Navigation over Unknown Terrain
IEEE International Conference on Robotics and Automation (ICRA), 2022
Tianrui Guan
Ruitao Song
Zhixian Ye
Liangjun Zhang
199
16
0
16 Sep 2022
Neural Networks Reduction via Lumping
International Conference of the Italian Association for Artificial Intelligence (AIxIA), 2022
Dalila Ressi
Riccardo Romanello
S. Rossi
Carla Piazza
220
5
0
15 Sep 2022
Joint Debiased Representation and Image Clustering Learning with Self-Supervision
Shun Zheng
JaeEun Nam
Emilio Dorigatti
B. Bischl
Shekoofeh Azizi
Mina Rezaei
SSL
139
0
0
14 Sep 2022
Revisiting Neural Scaling Laws in Language and Vision
Neural Information Processing Systems (NeurIPS), 2022
Ibrahim Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
489
143
0
13 Sep 2022
Socially Enhanced Situation Awareness from Microblogs using Artificial Intelligence: A Survey
ACM Computing Surveys (ACM CSUR), 2022
Rabindra Lamsal
Aaron Harwood
M. Read
261
27
0
13 Sep 2022
Communication-Efficient and Privacy-Preserving Feature-based Federated Transfer Learning
Global Communications Conference (GLOBECOM), 2022
Feng Wang
M. C. Gursoy
Senem Velipasalar
245
4
0
12 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
141
2
0
06 Sep 2022
AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier
Lars Heiliger
Zdravko Marinov
Max Hasin
André Ferreira
Jana Fragemann
...
D. Kersting
Victor Alves
Rainer Stiefelhagen
Jan Egger
Jens Kleesiek
97
11
0
02 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Neurocomputing (Neurocomputing), 2022
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
162
21
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
199
2
0
30 Aug 2022
Previous
1
2
3
...
10
11
6
7
8
9
Next