User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems

7 April 2025

Abstract

Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user exploration patterns. Large Language Models (LLMs) offer potential by leveraging their world knowledge to recommend novel content outside these loops. A key challenge is aligning LLMs with user preferences while preserving their knowledge and reasoning. While using LLMs to plan for the next novel user interest, this paper introduces a novel approach combining hierarchical planning with LLM inference-time scaling to improve recommendation relevancy without compromising novelty. We decouple novelty and user-alignment, training separate LLMs for each objective. We then scale up the novelty-focused LLM's inference and select the best-of-n predictions using the user-aligned LLM. Live experiments demonstrate efficacy, showing significant gains in both user satisfaction (measured by watch activity and active user counts) and exploration diversity.

View on arXiv

@article{wang2025_2504.05522,
  title={ User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems },
  author={ Jianling Wang and Yifan Liu and Yinghao Sun and Xuejian Ma and Yueqi Wang and He Ma and Zhengyang Su and Minmin Chen and Mingyan Gao and Onkar Dalal and Ed H. Chi and Lichan Hong and Ningren Han and Haokai Lu },
  journal={arXiv preprint arXiv:2504.05522},
  year={ 2025 }
}

Comments on this paper