ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11198
44
0

ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition

16 February 2025
Bidyarthi Paul
Faika Fairuj Preotee
Shuvashis Sarker
Shamim Rahim Refat
Shifat Islam
Tashreef Muhammad
Mohammad Ashraful Hoque
Shahriar Manzoor
ArXivPDFHTML
Abstract

ANCHOLIK-NER is a linguistically diverse dataset for Named Entity Recognition (NER) in Bangla regional dialects, capturing variations across Sylhet, Chittagong, Barishal, Noakhali, and Mymensingh. The dataset has around 17,405 sentences, 3,481 sentences per region. The data was collected from two publicly available datasets and through web scraping from various online newspapers, articles. To ensure high-quality annotations, the BIO tagging scheme was employed, and professional annotators with expertise in regional dialects carried out the labeling process. The dataset is structured into separate subsets for each region and is available in CSV format. Each entry contains textual data along with identified named entities and their corresponding annotations. Named entities are categorized into ten distinct classes: Person, Location, Organization, Food, Animal, Colour, Role, Relation, Object, and Miscellaneous. This dataset serves as a valuable resource for developing and evaluating NER models for Bangla dialectal variations, contributing to regional language processing and low-resource NLP applications. It can be utilized to enhance NER systems in Bangla dialects, improve regional language understanding, and support applications in machine translation, information retrieval, and conversational AI.

View on arXiv
@article{paul2025_2502.11198,
  title={ ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition },
  author={ Bidyarthi Paul and Faika Fairuj Preotee and Shuvashis Sarker and Shamim Rahim Refat and Shifat Islam and Tashreef Muhammad and Mohammad Ashraful Hoque and Shahriar Manzoor },
  journal={arXiv preprint arXiv:2502.11198},
  year={ 2025 }
}
Comments on this paper