How Much Does Attention Actually Attend? Questioning the Importance of
Attention in Pretrained Transformers

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

7 November 2022

Hao Peng

Papers citing "How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers"

9 / 9 papers shown

Title
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition Nicolas Menet Michael Hersche G. Karunaratne Luca Benini Abu Sebastian Abbas Rahimi 28 13 0 05 Dec 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi 28 12 0 15 Oct 2023
PMET: Precise Model Editing in a Transformer Xiaopeng Li Shasha Li Shezheng Song Jing Yang Jun Ma Jie Yu KELM 26 115 0 17 Aug 2023
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 28 6 0 13 Apr 2023
Efficient Methods for Natural Language Processing: A Survey Marcos Vinícius Treviso Ji-Ung Lee Tianchu Ji Betty van Aken Qingqing Cao ... Emma Strubell Niranjan Balasubramanian Leon Derczynski Iryna Gurevych Roy Schwartz 28 109 0 31 Aug 2022
Transformer Quality in Linear Time Weizhe Hua Zihang Dai Hanxiao Liu Quoc V. Le 73 222 0 21 Feb 2022
ABC: Attention with Bounded-memory Control Hao Peng Jungo Kasai Nikolaos Pappas Dani Yogatama Zhaofeng Wu Lingpeng Kong Roy Schwartz Noah A. Smith 61 22 0 06 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 404 0 24 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,956 0 20 Apr 2018