v1v2 (latest)

Scalable Sequential Recommendation under Latency and Memory Constraints

13 January 2026

Adithya Parthasarathy

Aswathnarayan Muthukrishnan Kirubakaran

Vinoth Punniyamoorthy

Nachiappan Chockalingam

Lokesh Butra

Kabilan Kannan

Abhirup Mazumder

Sumit Saha

Mamba

ArXiv (abs)PDF HTML Github

Main:12 Pages

3 Figures

Bibliography:2 Pages

3 Tables

Abstract

Sequential recommender systems must model long-range user behavior while operating under strict memory and latency constraints. Transformer-based approaches achieve strong accuracy but suffer from quadratic attention complexity, forcing aggressive truncation of user histories and limiting their practicality for long-horizon modeling. This paper presents HoloMambaRec, a lightweight sequential recommendation architecture that combines holographic reduced representations for attribute-aware embedding with a selective state space encoder for linear-time sequence processing. Item and attribute information are bound using circular convolution, preserving embedding dimensionality while encoding structured metadata. A shallow selective state space backbone, inspired by recent Mamba-style models, enables efficient training and constant-time recurrent inference. Experiments on Amazon Beauty and MovieLens-1M under a 10-epoch budget show that HoloMambaRec surpasses SASRec on both datasets, attains state-of-the-art ranking on MovieLens-1M, and trails only GRU4Rec on Amazon Beauty, all while maintaining substantially lower memory complexity. The design further incorporates forward-compatible mechanisms for temporal bundling and inference-time compression, positioning HoloMambaRec as a practical and extensible alternative for scalable, metadata-aware sequential recommendation.

View on arXiv

Comments on this paper