Exposing Attention Glitches with Flip-Flop Language Modeling

Exposing Attention Glitches with Flip-Flop Language Modeling

1 June 2023

A. Krishnamurthy

Papers citing "Exposing Attention Glitches with Flip-Flop Language Modeling"

18 / 18 papers shown

Title
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 39 0 0 29 Mar 2025
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 34 1 0 01 Feb 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models Ziwei Xu Sanjay Jain Mohan S. Kankanhalli HILM LRM 60 192 0 22 Jan 2024
How Language Model Hallucinations Can Snowball Muru Zhang Ofir Press William Merrill Alisa Liu Noah A. Smith HILM LRM 78 246 0 22 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input Amanda Bertsch Uri Alon Graham Neubig Matthew R. Gormley RALM 94 122 0 02 May 2023
Learning to Reason and Memorize with Self-Notes Jack Lanchantin Shubham Toshniwal Jason Weston Arthur Szlam Sainbayar Sukhbaatar ReLM LRM LLMAG 88 27 0 01 May 2023
Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao A. Panigrahi Rong Ge Sanjeev Arora 74 29 0 14 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences Antonio Orvieto Samuel L. Smith Albert Gu Anushan Fernando Çağlar Gülçehre Razvan Pascanu Soham De 88 258 0 11 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 101 61 0 07 Mar 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought Abulhair Saparov He He ELM LRM ReLM 116 270 0 03 Oct 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 297 3,163 0 21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Entity-Based Knowledge Conflicts in Question Answering Shayne Longpre Kartik Perisetla Anthony Chen Nikhil Ramesh Chris DuBois Sameer Singh HILM 235 236 0 10 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 234 690 0 27 Aug 2021
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning Yuhuai Wu M. Rabe Wenda Li Jimmy Ba Roger C. Grosse Christian Szegedy AIMat LRM 61 51 0 15 Jan 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 396 2,576 0 03 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,435 0 26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 214 7,687 0 17 Aug 2015