TransformerFAM: Feedback attention is working memory

TransformerFAM: Feedback attention is working memory

14 April 2024

Dongseong Hwang

Papers citing "TransformerFAM: Feedback attention is working memory"

10 / 10 papers shown

Title
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 63 1 0 03 Apr 2025
An Evolved Universal Transformer Memory Edoardo Cetin Qi Sun Tianyu Zhao Yujin Tang 43 0 0 17 Oct 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Tsendsuren Munkhdalai Manaal Faruqui Siddharth Gopal LRM LLMAG CLL 81 101 0 10 Apr 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 32 1 0 01 Feb 2024
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 239 2,554 0 04 May 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 278 3,784 0 18 Apr 2021
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer Siyu Ding Junyuan Shang Shuohuan Wang Yu Sun Hao Tian Hua-Hong Wu Haifeng Wang 60 52 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 249 1,982 0 28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 228 578 0 12 Mar 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 4,424 0 23 Jan 2020