ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.20420
34
0

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

27 February 2025
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
ArXivPDFHTML
Abstract

In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.

View on arXiv
@article{khan2025_2502.20420,
  title={ Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation },
  author={ Shaharukh Khan and Ayush Tarun and Ali Faraz and Palash Kamble and Vivek Dahiya and Praveen Pokala and Ashish Kulkarni and Chandra Khatri and Abhinav Ravi and Shubham Agarwal },
  journal={arXiv preprint arXiv:2502.20420},
  year={ 2025 }
}
Comments on this paper