Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14\% on held-out data and 12\% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.
View on arXiv@article{ng2025_2503.21000, title={ Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models }, author={ Lynnette Hui Xian Ng and Kokil Jaidka and Kaiyuan Tay and Hansin Ahuja and Niyati Chhaya }, journal={arXiv preprint arXiv:2503.21000}, year={ 2025 } }