RecGPT-V2 Technical Report

16 December 2025

Chao Yi

Dian Chen

Gaoyang Guo

Jiakai Tang

Jian Wu

Jing Yu

Mao Zhang

Wen Chen

Wenjun Yang

Yujie Luo

Yuning Jiang

Zhujin Gao

Bo Zheng

Binbin Cao

Changfa Wu

Dixuan Wang

Han Wu

Haoyi Hu

Kewei Zhu

Lang Tian

Lin Yang

Qiqi Huang

Siqi Yang

Wenbo Su

Xiaoxiao He

Xin Tong

Xu Chen

Xunke Xi

Xiaowei Huang

Yaxuan Wu

Yeqiu Yang

Yi Hu

Yujin Yuan

Yuliang Yan

Zile Zhou

LRM

ArXiv (abs)PDF HTML HuggingFace (16 upvotes)Github

Main:1 Pages

8 Figures

6 Tables

Appendix:30 Pages

Abstract

Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cognitive redundancy across multiple reasoning routes; (2) insufficient explanation diversity in fixed-template generation; (3) limited generalization under supervised learning paradigms; and (4) simplistic outcome-focused evaluation that fails to match human standards.To address these challenges, we present RecGPT-V2 with four key innovations. First, a Hierarchical Multi-Agent System restructures intent reasoning through coordinated collaboration, eliminating cognitive duplication while enabling diverse intent coverage. Combined with Hybrid Representation Inference that compresses user-behavior contexts, our framework reduces GPU consumption by 60% and improves exclusive recall from 9.39% to 10.99%. Second, a Meta-Prompting framework dynamically generates contextually adaptive prompts, improving explanation diversity by +7.3%. Third, constrained reinforcement learning mitigates multi-reward conflicts, achieving +24.1% improvement in tag prediction and +13.0% in explanation acceptance. Fourth, an Agent-as-a-Judge framework decomposes assessment into multi-step reasoning, improving human preference alignment. Online A/B tests on Taobao demonstrate significant improvements: +2.98% CTR, +3.71% IPV, +2.19% TV, and +11.46% NER. RecGPT-V2 establishes both the technical feasibility and commercial viability of deploying LLM-powered intent reasoning at scale, bridging the gap between cognitive exploration and industrial utility.

View on arXiv

Comments on this paper