ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04803
  4. Cited By
CoAtNet: Marrying Convolution and Attention for All Data Sizes
v1v2 (latest)

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
    ViT
ArXiv (abs)PDFHTML

Papers citing "CoAtNet: Marrying Convolution and Attention for All Data Sizes"

50 / 510 papers shown
Title
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor
  Control
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor ControlInternational Conference on Machine Learning (ICML), 2024
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
232
0
0
10 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
366
9
0
06 Jun 2024
Convolutional Neural Networks and Vision Transformers for Fashion MNIST
  Classification: A Literature Review
Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review
Sonia Bbouzidi
Ghazala Hcini
Imen Jdey
Fadoua Drira
251
8
0
05 Jun 2024
GrootVL: Tree Topology is All You Need in State Space Model
GrootVL: Tree Topology is All You Need in State Space Model
Yicheng Xiao
Lin Song
Shaoli Huang
Jiangshan Wang
Siyu Song
Yixiao Ge
Xiu Li
Mingyu Ding
Mamba
201
16
0
04 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
226
10
0
01 Jun 2024
Are queries and keys always relevant? A case study on Transformer wave functions
Are queries and keys always relevant? A case study on Transformer wave functions
Riccardo Rende
Luciano Loris Viteritti
258
11
0
29 May 2024
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Juntao Zhang
Kun Bian
Jun Zhou
Kun Bian
Jianning Liu
Jianning Liu
Jun Zhou
Kun Shao
Mamba
315
5
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
279
8
0
28 May 2024
Building Vision Models upon Heat Conduction
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViTVLM
239
2
0
26 May 2024
Smooth Pseudo-Labeling
Smooth Pseudo-Labeling
Nikolaos Karaliolios
Hervé Le Borgne
Florian Chabot
193
0
0
23 May 2024
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space ModelNeural Information Processing Systems (NeurIPS), 2024
Yuheng Shi
Minjing Dong
Chang Xu
Mamba
262
70
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
832
157
0
23 May 2024
Infinite-Dimensional Feature Interaction
Infinite-Dimensional Feature Interaction
Chenhui Xu
Fuxun Yu
Maoliang Li
Zihao Zheng
Zirui Xu
Jinjun Xiong
Xiang Chen
277
2
0
22 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in
  Large-Scale AI Models
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Jinqiang Cui
196
6
0
21 May 2024
MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding
  Space Binding
MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space BindingVisual Communications and Image Processing (VCIP), 2024
Jiajie Teng
Huiyu Duan
Yucheng Zhu
Sijing Wu
Guangtao Zhai
134
3
0
15 May 2024
Feature-based Federated Transfer Learning: Communication Efficiency,
  Robustness and Privacy
Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and PrivacyIEEE Transactions on Machine Learning in Communications and Networking (IEEE TMLCN), 2024
Feng Wang
M. C. Gursoy
Senem Velipasalar
211
3
0
15 May 2024
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?Computer Vision and Pattern Recognition (CVPR), 2024
Weihao Yu
Xinchao Wang
Mamba
285
166
0
13 May 2024
Information-driven Affordance Discovery for Efficient Robotic
  Manipulation
Information-driven Affordance Discovery for Efficient Robotic Manipulation
Pietro Mazzaglia
Taco Cohen
Daniel Dijkman
318
4
0
06 May 2024
UniGen: Unified Modeling of Initial Agent States and Trajectories for
  Generating Autonomous Driving Scenarios
UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios
R. Mahjourian
Rongbing Mu
Valerii Likhosherstov
Paul Mougin
Xiukun Huang
Joao Messias
Shimon Whiteson
163
11
0
06 May 2024
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on
  GPUs
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Fareed Qararyah
M. Azhar
Mohammad Ali Maleki
Pedro Trancoso
168
4
0
30 Apr 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient
  DNN Execution on Mobile
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
163
11
0
21 Apr 2024
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Ching-Kai Lin
Di-Chun Wei
Yun-Chien Cheng
267
0
0
09 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
311
143
0
08 Apr 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
253
25
0
05 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DVVLM
385
48
0
02 Apr 2024
Semantic Augmentation in Images using Language
Semantic Augmentation in Images using Language
Sahiti Yerramilli
Jayant Sravan Tamarapalli
Tanmay Girish Kulkarni
Jonathan M Francis
Eric Nyberg
DiffMVLM
169
8
0
02 Apr 2024
Structured Initialization for Attention in Vision Transformers
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
226
2
0
01 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
258
14
0
28 Mar 2024
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim
Byeongho Heo
Dongyoon Han
248
34
0
28 Mar 2024
Tiny Models are the Computational Saver for Large Models
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
380
4
0
26 Mar 2024
Neural Clustering based Visual Representation Learning
Neural Clustering based Visual Representation Learning
Guikun Chen
Xia Li
Yi Yang
Wenguan Wang
SSL
282
14
0
26 Mar 2024
3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical
  Image Segmentation
3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation
Dongwei Gan
Ming Chang
Juan Chen
ViTMedIm
124
0
0
25 Mar 2024
ParFormer: Vision Transformer Baseline with Parallel Local Global Token
  Mixer and Convolution Attention Patch Embedding
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViTMoE
219
0
0
22 Mar 2024
Loop Improvement: An Efficient Approach for Extracting Shared Features
  from Heterogeneous Data without Central Server
Loop Improvement: An Efficient Approach for Extracting Shared Features from Heterogeneous Data without Central Server
Fei Li
C. K. Loo
W. S. Liew
Xiaofeng Liu
FedML
149
0
0
21 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
HIRI-ViT: Scaling Vision Transformer with High Resolution InputsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
171
35
0
18 Mar 2024
Frozen Feature Augmentation for Few-Shot Image Classification
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär
N. Houlsby
Mostafa Dehghani
Manoj Kumar
VLM
197
14
0
15 Mar 2024
A Hierarchical Fused Quantum Fuzzy Neural Network for Image
  Classification
A Hierarchical Fused Quantum Fuzzy Neural Network for Image Classification
Shengyao Wu
Run-Ze Li
Yanqi Song
S. Qin
Qiaoyan Wen
Fei Gao
201
2
0
14 Mar 2024
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
Probabilistic Image-Driven Traffic Modeling via Remote SensingEuropean Conference on Computer Vision (ECCV), 2024
Scott Workman
Armin Hadzic
138
0
0
08 Mar 2024
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
Nabil Ibtehaz
Ning Yan
Masood S. Mortazavi
Daisuke Kihara
ViT
298
5
0
07 Mar 2024
Learning without Exact Guidance: Updating Large-scale High-resolution
  Land Cover Maps from Low-resolution Historical Labels
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuo Li
Wei He
Jiepan Li
Fangxiao Lu
Hongyan Zhang
413
23
0
05 Mar 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling
Accelerating Greedy Coordinate Gradient via Probe Sampling
Yiran Zhao
Wenyue Zheng
Tianle Cai
Xuan Long Do
Kenji Kawaguchi
Anirudh Goyal
Michael Shieh
245
27
0
02 Mar 2024
Interactive Multi-Head Self-Attention with Linear Complexity
Interactive Multi-Head Self-Attention with Linear Complexity
Hankyul Kang
Ming-Hsuan Yang
Jongbin Ryu
167
3
0
27 Feb 2024
Constraint Latent Space Matters: An Anti-anomalous Waveform
  Transformation Solution from Photoplethysmography to Arterial Blood Pressure
Constraint Latent Space Matters: An Anti-anomalous Waveform Transformation Solution from Photoplethysmography to Arterial Blood Pressure
Cheng Bian
Xiaoyu Li
Qi Bi
Guangpu Zhu
Jiegeng Lyu
Weile Zhang
Yelei Li
Zijing Zeng
149
1
0
23 Feb 2024
Towards Cross-Domain Continual Learning
Towards Cross-Domain Continual Learning
Marcus Vinícius de Carvalho
Mahardhika Pratama
Jie Zhang
Chua Haoyan
E. Yapp
CLL
173
3
0
19 Feb 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
439
101
0
08 Feb 2024
Text Role Classification in Scientific Charts Using Multimodal
  Transformers
Text Role Classification in Scientific Charts Using Multimodal Transformers
Hye Jin Kim
N. Lell
A. Scherp
92
1
0
08 Feb 2024
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning
  Models for Thoracic Anatomical Segmentation
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationIEEE Access (IEEE Access), 2024
Arash Harirpoush
Amir Rasoulian
Marta Kersten-Oertel
Yiming Xiao
3DV
149
2
0
05 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for
  Computer Vision: A survey
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A surveyEngineering applications of artificial intelligence (EAAI), 2024
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
245
35
0
05 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignComputer Vision and Pattern Recognition (CVPR), 2024
Seokju Yun
Youngmin Ro
ViT
358
83
0
29 Jan 2024
Convolutional Initialization for Data-Efficient Vision Transformers
Convolutional Initialization for Data-Efficient Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
236
2
0
23 Jan 2024
Previous
123456...91011
Next