ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04803
  4. Cited By
CoAtNet: Marrying Convolution and Attention for All Data Sizes
v1v2 (latest)

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
    ViT
ArXiv (abs)PDFHTML

Papers citing "CoAtNet: Marrying Convolution and Attention for All Data Sizes"

50 / 510 papers shown
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
  Prediction
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Han Cai
Junyan Li
Muyan Hu
Chuang Gan
Song Han
338
83
0
29 May 2022
How Tempering Fixes Data Augmentation in Bayesian Neural Networks
How Tempering Fixes Data Augmentation in Bayesian Neural NetworksInternational Conference on Machine Learning (ICML), 2022
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
BDLAAML
262
11
0
27 May 2022
Fast Vision Transformers with HiLo Attention
Fast Vision Transformers with HiLo AttentionNeural Information Processing Systems (NeurIPS), 2022
Zizheng Pan
Jianfei Cai
Bohan Zhuang
439
242
0
26 May 2022
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
  Hierarchical Vision Transformers
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Jihao Liu
Xin Huang
Jinliang Zheng
Yu Liu
Jiaming Song
293
79
0
26 May 2022
Inception Transformer
Inception TransformerNeural Information Processing Systems (NeurIPS), 2022
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
335
253
0
25 May 2022
MoCoViT: Mobile Convolutional Vision Transformer
Hailong Ma
Xin Xia
Xing Wang
Xuefeng Xiao
Jiashi Li
Min Zheng
ViT
378
21
0
25 May 2022
Visualizing CoAtNet Predictions for Aiding Melanoma Detection
Visualizing CoAtNet Predictions for Aiding Melanoma DetectionEngineering and Technology Journal (ETJ), 2022
Daniel Kvak
MedIm
186
3
0
21 May 2022
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative
  Priors
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative PriorsNeural Information Processing Systems (NeurIPS), 2022
Ravid Shwartz-Ziv
Micah Goldblum
Hossein Souri
Sanyam Kapoor
Chen Zhu
Yann LeCun
A. Wilson
UQCVBDL
162
46
0
20 May 2022
TRT-ViT: TensorRT-oriented Vision Transformer
TRT-ViT: TensorRT-oriented Vision Transformer
Xin Xia
Jiashi Li
Jie Wu
Xing Wang
Xuefeng Xiao
Min Zheng
Rui Wang
ViT
216
34
0
19 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Shiyang Feng
Teli Ma
Jiaming Song
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
256
150
0
08 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
667
1,596
0
04 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public CloudProceedings of the VLDB Endowment (PVLDB), 2022
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
442
47
0
30 Apr 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
236
101
0
27 Apr 2022
TranSiam: Fusing Multimodal Visual Features Using Transformer for
  Medical Image Segmentation
TranSiam: Fusing Multimodal Visual Features Using Transformer for Medical Image Segmentation
Xia Li
Shiqiang Ma
Jijun Tang
Fei Guo
ViTMedIm
79
12
0
26 Apr 2022
Investigating Neural Architectures by Synthetic Dataset Design
Investigating Neural Architectures by Synthetic Dataset Design
Adrien Courtois
Jean-Michel Morel
Pablo Arias
172
4
0
23 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
218
65
0
18 Apr 2022
ResT V2: Simpler, Faster and Stronger
ResT V2: Simpler, Faster and StrongerNeural Information Processing Systems (NeurIPS), 2022
Qing-Long Zhang
Yubin Yang
ViT
242
30
0
15 Apr 2022
Localization Distillation for Object Detection
Localization Distillation for Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zhaohui Zheng
Rongguang Ye
Ping Wang
Dongwei Ren
Jun Wang
W. Zuo
Ming-Ming Cheng
215
79
0
12 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
367
343
0
07 Apr 2022
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
L. Brinkmeyer
Rafael Rêgo Drumond
Johannes Burchert
Lars Schmidt-Thieme
AI4TS
203
12
0
07 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionIEEE International Conference on Computer Vision (ICCV), 2022
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
237
66
0
06 Apr 2022
MaxViT: Multi-Axis Vision Transformer
MaxViT: Multi-Axis Vision TransformerEuropean Conference on Computer Vision (ECCV), 2022
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
479
881
0
04 Apr 2022
Revisiting a kNN-based Image Classification System with High-capacity
  Storage
Revisiting a kNN-based Image Classification System with High-capacity StorageEuropean Conference on Computer Vision (ECCV), 2022
K. Nakata
Youyang Ng
Daisuke Miyashita
A. Maki
Yu Lin
J. Deguchi
242
29
0
03 Apr 2022
InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
InstaFormer: Instance-Aware Image-to-Image Translation with TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Soohyun Kim
Jongbeom Baek
Jihye Park
Gyeongnyeon Kim
Seung Wook Kim
ViT
301
58
0
30 Mar 2022
Affine Medical Image Registration with Coarse-to-Fine Vision Transformer
Affine Medical Image Registration with Coarse-to-Fine Vision TransformerComputer Vision and Pattern Recognition (CVPR), 2022
Tony C. W. Mok
Albert C. S. Chung
ViTMedIm
194
79
0
29 Mar 2022
Automated Progressive Learning for Efficient Training of Vision
  Transformers
Automated Progressive Learning for Efficient Training of Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
250
54
0
28 Mar 2022
DepthFormer: Exploiting Long-Range Correlation and Local Information for
  Accurate Monocular Depth Estimation
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth EstimationMachine Intelligence Research (MIR), 2022
Zhenyu Li
Zehui Chen
Xianming Liu
Junjun Jiang
ViTMDE
178
225
1
27 Mar 2022
On the link between conscious function and general intelligence in
  humans and machines
On the link between conscious function and general intelligence in humans and machines
Arthur Juliani
Kai Arulkumaran
Shuntaro Sasai
Ryota Kanai
275
27
0
24 Mar 2022
Deep Frequency Filtering for Domain Generalization
Deep Frequency Filtering for Domain GeneralizationComputer Vision and Pattern Recognition (CVPR), 2022
Shiqi Lin
Zhizheng Zhang
Zhipeng Huang
Yan Lu
Cuiling Lan
...
Jiang Wang
Zicheng Liu
Amey Parulkar
V. Navkal
Zhibo Chen
254
68
0
23 Mar 2022
Symmetry-Based Representations for Artificial and Biological General
  Intelligence
Symmetry-Based Representations for Artificial and Biological General IntelligenceFrontiers in Computational Neuroscience (Front. Comput. Neurosci.), 2022
I. Higgins
S. Racanière
Danilo Jimenez Rezende
AI4CE
250
52
0
17 Mar 2022
Stubborn: A Strong Baseline for Indoor Object Navigation
Stubborn: A Strong Baseline for Indoor Object NavigationIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Haokuan Luo
Albert Yue
Zhang-Wei Hong
Pulkit Agrawal
280
57
0
14 Mar 2022
TransCAM: Transformer Attention-based CAM Refinement for Weakly
  Supervised Semantic Segmentation
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic SegmentationJournal of Visual Communication and Image Representation (JVCIR), 2022
Ruiwen Li
Zheda Mai
C. Trabelsi
Zhibo Zhang
Jongseong Jang
Scott Sanner
ViT
189
75
0
14 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
176
32
0
13 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference timeInternational Conference on Machine Learning (ICML), 2022
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
728
1,281
1
10 Mar 2022
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets
  and Transformer
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and TransformerEuropean Conference on Computer Vision (ECCV), 2022
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
330
75
0
08 Mar 2022
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao
Yi Jiang
Bin Wen
Jianxiang Sun
Zehuan Yuan
168
67
0
05 Mar 2022
ViT-P: Rethinking Data-efficient Vision Transformers from Locality
ViT-P: Rethinking Data-efficient Vision Transformers from Locality
Bin Chen
Ran A. Wang
Di Ming
Xin Feng
ViT
80
7
0
04 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
  for Image Recognition without Convolutions
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without ConvolutionsIEEE International Conference on Consumer Electronics (ICCE), 2022
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
137
2
0
02 Mar 2022
A Data-scalable Transformer for Medical Image Segmentation:
  Architecture, Model Efficiency, and Benchmark
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
Yunhe Gao
Mu Zhou
Ding Liu
Zhennan Yan
Shaoting Zhang
Dimitris N. Metaxas
ViTMedIm
697
95
0
28 Feb 2022
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for
  Visual Discrimination
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination
Qingsong Zhao
Shuguang Dou
Zhipeng Zhou
Yangguang Li
Yin Wang
Yu Qiao
Cairong Zhao
236
0
0
21 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and BeyondInternational Journal of Computer Vision (IJCV), 2022
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
275
272
0
21 Feb 2022
Visual Attention Network
Visual Attention NetworkComputational Visual Media (CVM), 2022
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViTVLM
470
869
0
20 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
602
554
0
18 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?International Conference on Learning Representations (ICLR), 2022
Namuk Park
Songkuk Kim
ViT
465
599
0
14 Feb 2022
KENN: Enhancing Deep Neural Networks by Leveraging Knowledge for Time
  Series Forecasting
KENN: Enhancing Deep Neural Networks by Leveraging Knowledge for Time Series Forecasting
M. A. Chattha
L. V. Elst
M. I. Malik
Andreas Dengel
Sheraz Ahmed
AI4TS
266
0
0
08 Feb 2022
Towards an Analytical Definition of Sufficient Data
Towards an Analytical Definition of Sufficient DataSN Computer Science (SN Comput. Sci.), 2022
Adam Byerly
T. Kalganova
180
5
0
07 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkInternational Conference on Machine Learning (ICML), 2022
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
517
1,006
0
07 Feb 2022
Learning strides in convolutional neural networks
Learning strides in convolutional neural networksInternational Conference on Learning Representations (ICLR), 2022
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
159
49
0
03 Feb 2022
Architecture Matters in Continual Learning
Architecture Matters in Continual Learning
Seyed Iman Mirzadeh
Arslan Chaudhry
Dong Yin
Timothy Nguyen
Razvan Pascanu
Dilan Görür
Mehrdad Farajtabar
OODKELM
369
63
0
01 Feb 2022
Patches Are All You Need?
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
437
482
0
24 Jan 2022
Previous
123...101189
Next