Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps

1 February 2023

Papers citing "Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps"

7 / 7 papers shown

Title
Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma Kenton W. Murray Kevin Duh AI4CE 50 0 0 10 Jan 2025
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models Xiyu Liu Zhengxiao Liu Naibin Gu Zheng-Shen Lin Wanli Ma Ji Xiang Weiping Wang KELM 44 0 0 27 Aug 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 44 7 0 09 May 2024
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency Giovanni Puccetti Anna Rogers Aleksandr Drozd F. Dell’Orletta 71 42 0 23 May 2022
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 158 46 0 15 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 244 2,600 0 04 May 2021