ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06692
49
0

Multi-label Scandinavian Language Identification (SLIDE)

10 February 2025
Mariia Fedorova
Jonas Sebulon Frydenberg
Victoria Handford
Victoria Ovedie Chruickshank Langø
Solveig Helene Willoch
Marthe Løken Midtgaard
Yves Scherrer
Petter Mæhlum
David Samuel
ArXivPDFHTML
Abstract

Identifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language. In this paper, we focus on multi-label sentence-level Scandinavian language identification (LID) for Danish, Norwegian Bokmål, Norwegian Nynorsk, and Swedish. We present the Scandinavian Language Identification and Evaluation, SLIDE, a manually curated multi-label evaluation dataset and a suite of LID models with varying speed-accuracy tradeoffs. We demonstrate that the ability to identify multiple languages simultaneously is necessary for any accurate LID method, and present a novel approach to training such multi-label LID models.

View on arXiv
@article{fedorova2025_2502.06692,
  title={ Multi-label Scandinavian Language Identification (SLIDE) },
  author={ Mariia Fedorova and Jonas Sebulon Frydenberg and Victoria Handford and Victoria Ovedie Chruickshank Langø and Solveig Helene Willoch and Marthe Løken Midtgaard and Yves Scherrer and Petter Mæhlum and David Samuel },
  journal={arXiv preprint arXiv:2502.06692},
  year={ 2025 }
}
Comments on this paper