30
0

FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

Abstract

Spatial convolution is fundamental in constructing deep Convolutional Neural Networks (CNNs) for visual recognition. While dynamic convolution enhances model accuracy by adaptively combining static kernels, it incurs significant computational overhead, limiting its deployment in resource-constrained environments such as federated edge computing. To address this, we propose Fast Multi-Attention Dynamic Convolution (FMDConv), which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off. FMDConv achieves a better balance between accuracy and efficiency by selectively enhancing feature extraction with lower complexity. Furthermore, we introduce two novel quantitative metrics, the Inverse Efficiency Score and Rate-Correct Score, to systematically evaluate this trade-off. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that FMDConv reduces the computational cost by up to 49.8\% on ResNet-18 and 42.2\% on ResNet-50 compared to prior multi-attention dynamic convolution methods while maintaining competitive accuracy. These advantages make FMDConv highly suitable for real-world, resource-constrained applications.

View on arXiv
@article{zhang2025_2503.17530,
  title={ FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off },
  author={ Tianyu Zhang and Fan Wan and Haoran Duan and Kevin W. Tong and Jingjing Deng and Yang Long },
  journal={arXiv preprint arXiv:2503.17530},
  year={ 2025 }
}
Comments on this paper