298
v1v2v3v4 (latest)

Gated Rotary-Enhanced Linear Attention with Rank Modulation for Long-term Sequential Recommendation

Main:1 Pages
10 Figures
Appendix:14 Pages
Abstract

In Sequential Recommendation Systems (SRSs), Transformer models have demonstrated remarkable performance but face computational and memory cost challenges, especially when modeling long-term user behavior sequences. Due to its quadratic complexity, the dot-product attention mechanism in Transformers becomes expensive for processing long sequences. By approximating the dot-product attention using elaborate mapping functions, linear attention provides a more efficient option with linear complexity. However, existing linear attention methods face three limitations: 1) they often use learnable position encodings, which incur extra computational costs in long-term sequence scenarios, 2) limited by the low-rank deficiency, they may not sufficiently account for user's fine-grained local preferences (short-lived burst of interest), and 3) they try to capture some temporary activities, but often confuse these with stable and long-term interests. This can result in unclear or less effective recommendations. To remedy these drawbacks, we propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA). Specifically, we first propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency within the user's historical information using rotary position encodings. Then, to address the low-rank deficiency of linear attention, we introduce an Adaptive Rank Modulator. It incorporates a rank augmentation branch to explicitly inject local token mixing and a Gated Rank Selector to dynamically balance stable long-term preferences and transient short-term interests. Experimental results on four public benchmark datasets show that our RecGRELA achieves state-of-the-art performance compared with existing SRSs based on Recurrent Neural Networks, Transformer, and Mamba while keeping low memory overhead.

View on arXiv
Comments on this paper