237

PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

ACM Conference on Recommender Systems (RecSys), 2025
Main:8 Pages
4 Figures
Bibliography:2 Pages
7 Tables
Abstract

User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently coupling it with existing models. While this pretraining-and-fine-tuning approach has been popular in other domains, such as Vision and NLP, its application in industrial recommender systems presents numerous challenges. The foundational model must be scalable enough to score millions of items every second while meeting tight cost and latency constraints imposed by these systems. Additionally, it should capture the interactions between user activities and other features and handle new items that were not present during the pretraining stage.

View on arXiv
Comments on this paper