v1v2 (latest)

Bending the Scaling Law Curve in Large-Scale Recommendation Systems

19 February 2026

Qin Ding

Kevin Course

Linjian Ma

Jianhui Sun

Ruochen Liu

Zhao Zhu

Chunxing Yin

Wei Li

Dai Li

Yu Shi

Xuan Cao

Ze Yang

Han Li

Xing Liu

Bi Xue

Hongwei Li

Rui Jian

Daisy Shi He

Jing Qian

Matt Ma

Qunshu Zhang

Rui Li

ArXiv (abs)PDF HTML Github

Main:10 Pages

10 Figures

Bibliography:3 Pages

7 Tables

Appendix:6 Pages

Abstract

Learning from user interaction history through sequential models has become a cornerstone of large-scale recommender systems. Recent advances in large language models have revealed promising scaling laws, sparking a surge of research into long-sequence modeling and deeper architectures for recommendation tasks. However, many recent approaches rely heavily on cross-attention mechanisms to address the quadratic computational bottleneck in sequential modeling, which can limit the representational power gained from self-attention. We present ULTRA-HSTU, a novel sequential recommendation model developed through end-to-end model and system co-design. By innovating in the design of input sequences, sparse attention mechanisms, and model topology, ULTRA-HSTU achieves substantial improvements in both model quality and efficiency. Comprehensive benchmarking demonstrates that ULTRA-HSTU achieves remarkable scaling efficiency gains -- over 5x faster training scaling and 21x faster inference scaling compared to conventional models -- while delivering superior recommendation quality. Our solution is fully deployed at scale, serving billions of users daily and driving significant 4% to 8% consumption and engagement improvements in real-world production environments.

View on arXiv

Comments on this paper