ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.08446
45
22
v1v2 (latest)

All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

25 July 2017
Jasabanta Patro
Bidisha Samanta
Saurabh Singh
Aparna Basu
Prithwish Mukherjee
Monojit Choudhury
Animesh Mukherjee
ArXiv (abs)PDFHTML
Abstract

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman correlation coefficient values, our methods perform more than two times better (nearly 0.62) in predicting the borrowing likeliness compared to the best performing baseline (nearly 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88 percent of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.

View on arXiv
Comments on this paper