ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18603
30
0

LANGALIGN: Enhancing Non-English Language Models via Cross-Lingual Embedding Alignment

24 March 2025
Jong Myoung Kim
Young-Jun Lee
Ho-Jin Choi
Sangkeun Jung
ArXivPDFHTML
Abstract

While Large Language Models have gained attention, many service developers still rely on embedding-based models due to practical constraints. In such cases, the quality of fine-tuning data directly impacts performance, and English datasets are often used as seed data for training non-English models. In this study, we propose LANGALIGN, which enhances target language processing by aligning English embedding vectors with those of the target language at the interface between the language model and the task header. Experiments on Korean, Japanese, and Chinese demonstrate that LANGALIGN significantly improves performance across all three languages. Additionally, we show that LANGALIGN can be applied in reverse to convert target language data into a format that an English-based model can process.

View on arXiv
@article{kim2025_2503.18603,
  title={ LANGALIGN: Enhancing Non-English Language Models via Cross-Lingual Embedding Alignment },
  author={ Jong Myoung Kim and Young-Jun Lee and Ho-Jin Choi and Sangkeun Jung },
  journal={arXiv preprint arXiv:2503.18603},
  year={ 2025 }
}
Comments on this paper