Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

6 February 2024

Papers citing "Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains"

7 / 7 papers shown

Title
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? Yutong Yin Zhaoran Wang LRM ReLM 95 0 0 27 Jan 2025
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities Wanpeng Zhang Zilong Xie Yicheng Feng Yijiang Li Xingrun Xing Sipeng Zheng Zongqing Lu MLLM 20 0 0 03 Oct 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 24 13 0 08 Feb 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 112 61 0 07 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 491 0 01 Nov 2022
Rethinking embedding coupling in pre-trained language models Hyung Won Chung Thibault Févry Henry Tsai Melvin Johnson Sebastian Ruder 93 142 0 24 Oct 2020