24

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

Anton Klenitskiy
Konstantin Polev
Daria Denisova
Alexey Vasilev
Dmitry Simakov
Gleb Gusev
Main:2 Pages
16 Figures
7 Tables
Appendix:12 Pages
Abstract

Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently sparse autoencoders (SAE) have been shown to be a promising unsupervised approach for extracting interpretable features from language models. These autoencoders learn to reconstruct hidden states of the transformer's internal layers from sparse linear combinations of directions in their activation space.

View on arXiv
Comments on this paper