Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

14 May 2024

Wei Han

Papers citing "Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory"

8 / 8 papers shown

Title
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference WeiZhi Fei Xueyan Niu Guoqing Xie Yingqing Liu Bo Bai Wei Han 28 1 0 22 Jan 2025
A Theoretical Survey on Foundation Models Shi Fu Yuzhu Chen Yingjie Wang Dacheng Tao 18 0 0 15 Oct 2024
On the Limitations of Compute Thresholds as a Governance Strategy Sara Hooker 39 14 0 08 Jul 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Tsendsuren Munkhdalai Manaal Faruqui Siddharth Gopal LRM LLMAG CLL 79 101 0 10 Apr 2024
Language models scale reliably with over-training and on downstream tasks S. Gadre Georgios Smyrnis Vaishaal Shankar Suchin Gururangan Mitchell Wortsman ... Y. Carmon Achal Dave Reinhard Heckel Niklas Muennighoff Ludwig Schmidt ALM ELM LRM 91 40 0 13 Mar 2024
Word Acquisition in Neural Language Models Tyler A. Chang Benjamin Bergen 27 29 0 05 Oct 2021
Hierarchical Associative Memory Dmitry Krotov BDL 89 22 0 14 Jul 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020