ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15745
36
0

On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

8 February 2025
Gautam Kishore Shahi
Oliver Hummel
ArXivPDFHTML
Abstract

The rapid advancement of Large Language Models (LLMs) has led to a multitude of application opportunities. One traditional task for Information Retrieval systems is the summarization and classification of texts, both of which are important for supporting humans in navigating large literature bodies as they e.g. exist with scientific publications. Due to this rapidly growing body of scientific knowledge, recent research has been aiming at building research information systems that not only offer traditional keyword search capabilities, but also novel features such as the automatic detection of research areas that are present at knowledge intensive organizations in academia and industry. To facilitate this idea, we present the results obtained from evaluating a variety of LLMs in their ability to sort scientific publications into hierarchical classifications systems. Using the FORC dataset as ground truth data, we have found that recent LLMs (such as Meta Llama 3.1) are able to reach an accuracy of up to 0.82, which is up to 0.08 better than traditional BERT models.

View on arXiv
@article{shahi2025_2502.15745,
  title={ On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts },
  author={ Gautam Kishore Shahi and Oliver Hummel },
  journal={arXiv preprint arXiv:2502.15745},
  year={ 2025 }
}
Comments on this paper