Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.04803
Cited By
v1
v2 (latest)
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Neural Information Processing Systems (NeurIPS), 2021
9 June 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CoAtNet: Marrying Convolution and Attention for All Data Sizes"
50 / 510 papers shown
Title
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
International Conference on Machine Learning (ICML), 2024
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
232
0
0
10 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
366
9
0
06 Jun 2024
Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review
Sonia Bbouzidi
Ghazala Hcini
Imen Jdey
Fadoua Drira
251
8
0
05 Jun 2024
GrootVL: Tree Topology is All You Need in State Space Model
Yicheng Xiao
Lin Song
Shaoli Huang
Jiangshan Wang
Siyu Song
Yixiao Ge
Xiu Li
Mingyu Ding
Mamba
201
16
0
04 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
226
10
0
01 Jun 2024
Are queries and keys always relevant? A case study on Transformer wave functions
Riccardo Rende
Luciano Loris Viteritti
258
11
0
29 May 2024
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Juntao Zhang
Kun Bian
Jun Zhou
Kun Bian
Jianning Liu
Jianning Liu
Jun Zhou
Kun Shao
Mamba
315
5
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
279
8
0
28 May 2024
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViT
VLM
239
2
0
26 May 2024
Smooth Pseudo-Labeling
Nikolaos Karaliolios
Hervé Le Borgne
Florian Chabot
193
0
0
23 May 2024
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Neural Information Processing Systems (NeurIPS), 2024
Yuheng Shi
Minjing Dong
Chang Xu
Mamba
262
70
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
832
157
0
23 May 2024
Infinite-Dimensional Feature Interaction
Chenhui Xu
Fuxun Yu
Maoliang Li
Zihao Zheng
Zirui Xu
Jinjun Xiong
Xiang Chen
277
2
0
22 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Jinqiang Cui
196
6
0
21 May 2024
MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding
Visual Communications and Image Processing (VCIP), 2024
Jiajie Teng
Huiyu Duan
Yucheng Zhu
Sijing Wu
Guangtao Zhai
134
3
0
15 May 2024
Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy
IEEE Transactions on Machine Learning in Communications and Networking (IEEE TMLCN), 2024
Feng Wang
M. C. Gursoy
Senem Velipasalar
211
3
0
15 May 2024
MambaOut: Do We Really Need Mamba for Vision?
Computer Vision and Pattern Recognition (CVPR), 2024
Weihao Yu
Xinchao Wang
Mamba
285
166
0
13 May 2024
Information-driven Affordance Discovery for Efficient Robotic Manipulation
Pietro Mazzaglia
Taco Cohen
Daniel Dijkman
318
4
0
06 May 2024
UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios
R. Mahjourian
Rongbing Mu
Valerii Likhosherstov
Paul Mougin
Xiukun Huang
Joao Messias
Shimon Whiteson
163
11
0
06 May 2024
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Fareed Qararyah
M. Azhar
Mohammad Ali Maleki
Pedro Trancoso
168
4
0
30 Apr 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
163
11
0
21 Apr 2024
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Ching-Kai Lin
Di-Chun Wei
Yun-Chien Cheng
267
0
0
09 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
311
143
0
08 Apr 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
253
25
0
05 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Computer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DV
VLM
385
48
0
02 Apr 2024
Semantic Augmentation in Images using Language
Sahiti Yerramilli
Jayant Sravan Tamarapalli
Tanmay Girish Kulkarni
Jonathan M Francis
Eric Nyberg
DiffM
VLM
169
8
0
02 Apr 2024
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
226
2
0
01 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
258
14
0
28 Mar 2024
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim
Byeongho Heo
Dongyoon Han
248
34
0
28 Mar 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
380
4
0
26 Mar 2024
Neural Clustering based Visual Representation Learning
Guikun Chen
Xia Li
Yi Yang
Wenguan Wang
SSL
282
14
0
26 Mar 2024
3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation
Dongwei Gan
Ming Chang
Juan Chen
ViT
MedIm
124
0
0
25 Mar 2024
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViT
MoE
219
0
0
22 Mar 2024
Loop Improvement: An Efficient Approach for Extracting Shared Features from Heterogeneous Data without Central Server
Fei Li
C. K. Loo
W. S. Liew
Xiaofeng Liu
FedML
149
0
0
21 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
171
35
0
18 Mar 2024
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär
N. Houlsby
Mostafa Dehghani
Manoj Kumar
VLM
197
14
0
15 Mar 2024
A Hierarchical Fused Quantum Fuzzy Neural Network for Image Classification
Shengyao Wu
Run-Ze Li
Yanqi Song
S. Qin
Qiaoyan Wen
Fei Gao
201
2
0
14 Mar 2024
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
European Conference on Computer Vision (ECCV), 2024
Scott Workman
Armin Hadzic
138
0
0
08 Mar 2024
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
Nabil Ibtehaz
Ning Yan
Masood S. Mortazavi
Daisuke Kihara
ViT
298
5
0
07 Mar 2024
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuo Li
Wei He
Jiepan Li
Fangxiao Lu
Hongyan Zhang
413
23
0
05 Mar 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling
Yiran Zhao
Wenyue Zheng
Tianle Cai
Xuan Long Do
Kenji Kawaguchi
Anirudh Goyal
Michael Shieh
245
27
0
02 Mar 2024
Interactive Multi-Head Self-Attention with Linear Complexity
Hankyul Kang
Ming-Hsuan Yang
Jongbin Ryu
167
3
0
27 Feb 2024
Constraint Latent Space Matters: An Anti-anomalous Waveform Transformation Solution from Photoplethysmography to Arterial Blood Pressure
Cheng Bian
Xiaoyu Li
Qi Bi
Guangpu Zhu
Jiegeng Lyu
Weile Zhang
Yelei Li
Zijing Zeng
149
1
0
23 Feb 2024
Towards Cross-Domain Continual Learning
Marcus Vinícius de Carvalho
Mahardhika Pratama
Jie Zhang
Chua Haoyan
E. Yapp
CLL
173
3
0
19 Feb 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
439
101
0
08 Feb 2024
Text Role Classification in Scientific Charts Using Multimodal Transformers
Hye Jin Kim
N. Lell
A. Scherp
92
1
0
08 Feb 2024
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation
IEEE Access (IEEE Access), 2024
Arash Harirpoush
Amir Rasoulian
Marta Kersten-Oertel
Yiming Xiao
3DV
149
2
0
05 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Engineering applications of artificial intelligence (EAAI), 2024
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
245
35
0
05 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Computer Vision and Pattern Recognition (CVPR), 2024
Seokju Yun
Youngmin Ro
ViT
358
83
0
29 Jan 2024
Convolutional Initialization for Data-Efficient Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
236
2
0
23 Jan 2024
Previous
1
2
3
4
5
6
...
9
10
11
Next