40

UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Main:8 Pages
8 Figures
Bibliography:4 Pages
13 Tables
Appendix:4 Pages
Abstract

Convolutional neural networks (ConvNets) with large effective receptive field (ERF), still in their early stages, have demonstrated promising effectiveness while constrained by high parameters and FLOPs costs and disrupted asymptotically Gaussian distribution (AGD) of ERF. This paper proposes an alternative paradigm: rather than merely employing extremely large ERF, it is more effective and efficient to expand the ERF while maintaining AGD of ERF by proper combination of smaller kernels, such as 7×77\times{7}, 9×99\times{9}, 11×1111\times{11}. This paper introduces a Three-layer Receptive Field Aggregator and designs a Layer Operator as the fundamental operator from the perspective of receptive field. The ERF can be expanded to the level of existing large-kernel ConvNets through the stack of proposed modules while maintaining AGD of ERF. Using these designs, we propose a universal model for ConvNet of any scale, termed UniConvNet. Extensive experiments on ImageNet-1K, COCO2017, and ADE20K demonstrate that UniConvNet outperforms state-of-the-art CNNs and ViTs across various vision recognition tasks for both lightweight and large-scale models with comparable throughput. Surprisingly, UniConvNet-T achieves 84.2%84.2\% ImageNet top-1 accuracy with 30M30M parameters and 5.1G5.1G FLOPs. UniConvNet-XL also shows competitive scalability to big data and large models, acquiring 88.4%88.4\% top-1 accuracy on ImageNet. Code and models are publicly available atthis https URL.

View on arXiv
Comments on this paper