In a Few Words: Comparing Weak Supervision and LLMs for Short Query Intent Classification

User intent classification is an important task in information retrieval. Previously, user intents were classified manually and automatically; the latter helped to avoid hand labelling of large datasets. Recent studies explored whether LLMs can reliably determine user intent. However, researchers have recognized the limitations of using generative LLMs for classification tasks. In this study, we empirically compare user intent classification into informational, navigational, and transactional categories, using weak supervision and LLMs. Specifically, we evaluate LLaMA-3.1-8B-Instruct and LLaMA-3.1-70B-Instruct for in-context learning and LLaMA-3.1-8B-Instruct for fine-tuning, comparing their performance to an established baseline classifier trained using weak supervision (ORCAS-I). Our results indicate that while LLMs outperform weak supervision in recall, they continue to struggle with precision, which shows the need for improved methods to balance both metrics effectively.
View on arXiv@article{alexander2025_2504.21398, title={ In a Few Words: Comparing Weak Supervision and LLMs for Short Query Intent Classification }, author={ Daria Alexander and Arjen P. de Vries }, journal={arXiv preprint arXiv:2504.21398}, year={ 2025 } }