N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs Ilya Zisman Alexander Nikulin Andrei Polubarov Nikita Lyubaykin Vladislav Kurenkov Andrei Polubarov Igor Kiselev Vladislav Kurenkov |
Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023 |
Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022 |