69
0

AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

Abstract

Most existing works in image caption synthesis use computation heavy deep neural networks and generates image descriptions in English language. This often restricts this important assistive tool for widespread use across language and accessibility barriers. This work presents AC-Lite, a computationally efficient model for image captioning in low-resource Assamese language. AC-Lite reduces computational requirements by replacing computation-heavy deep network components with lightweight alternatives. The AC-Lite model is designed through extensive ablation experiments with different image feature extractor networks and language decoders. A combination of ShuffleNetv2x1.5 with GRU based language decoder along with bilinear attention is found to provide the best performance with minimum compute. AC-Lite was observed to achieve an 82.3 CIDEr score on the COCO-AC dataset with 2.45 GFLOPs and 22.87M parameters.

View on arXiv
@article{choudhury2025_2503.01453,
  title={ AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language },
  author={ Pankaj Choudhury and Yogesh Aggarwal and Prabhanjan Jadhav and Prithwijit Guha and Sukumar Nandi },
  journal={arXiv preprint arXiv:2503.01453},
  year={ 2025 }
}
Comments on this paper