452
v1v2v3 (latest)

Towards An Efficient LLM Training Paradigm for CTR Prediction

Main:8 Pages
6 Figures
Bibliography:2 Pages
3 Tables
Appendix:1 Pages
Abstract

Large Language Models (LLMs) have demonstrated tremendous potential as the next-generation ranking-based recommendation system. Many recent works have shown that LLMs can significantly outperform conventional click-through-rate (CTR) prediction approaches. Despite such promising results, the computational inefficiency inherent in the current training paradigm makes it particularly challenging to train LLMs for ranking-based recommendation tasks on large datasets. To train LLMs for CTR prediction, most existing studies adopt the prevalent ''sliding-window'' paradigm. Given a sequence of mm user interactions, a unique training prompt is constructed for each interaction by designating it as the prediction target along with its preceding nn interactions serving as context. In turn, the sliding-window paradigm results in an overall complexity of O(mn2)O(mn^2) that scales linearly with the length of user interactions. Consequently, a direct adoption to train LLMs with such strategy can result in prohibitively high training costs as the length of interactions grows. To alleviate the computational inefficiency, we propose a novel training paradigm, namely Dynamic Target Isolation (DTI), that structurally parallelizes the training of kk (where k>>1k >> 1) target interactions. Furthermore, we identify two major bottlenecks - hidden-state leakage and positional bias overfitting - that limit DTI to only scale up to a small value of kk (e.g., 5) then propose a computationally light solution to effectively tackle each. Through extensive experiments on three widely adopted public CTR datasets, we empirically show that DTI reduces training time by an average of \textbf{92%} (e.g., from 70.570.5 hrs to 5.315.31 hrs), without compromising CTR prediction performance.

View on arXiv
Comments on this paper