Hard Negative Sampling via Large Language Models for Recommendation
Hard negative sampling improves recommendation performance by accelerating convergence and sharpening the decision boundary. However, most existing methods rely on heuristic strategies, selecting negatives from a fixed candidate pool. Lacking semantic awareness, these methods often misclassify items that align with users' semantic interests as negatives, resulting in False Hard Negative Samples (FHNS). Such FHNS inject noisy supervision and hinder the model's optimal performance. To address this challenge, we propose HNLMRec, a generative semantic negative sampling framework. Leveraging the semantic reasoning capabilities of Large Language Models (LLMs), HNLMRec directly generates negative samples that are behaviorally distinct yet semantically relevant with respect to user preferences. Furthermore, we integrate collaborative filtering signals into the LLM via supervised fine-tuning, guiding the model to synthesize more reliable and informative hard negatives. Extensive experiments on multiple real-world datasets demonstrate that HNLMRec significantly outperforms traditional methods and LLM-enhanced baselines, while effectively mitigating popularity bias and data sparsity, thereby improving generalization.
View on arXiv