
Title |
|---|
![]() Going Beyond Linear Transformers with Recurrent Fast Weight ProgrammersNeural Information Processing Systems (NeurIPS), 2021 |
![]() Frustratingly Short Attention Spans in Neural Language ModelingInternational Conference on Learning Representations (ICLR), 2017 |