Informer: Transformer Likes Informed Attention

Findings (Findings), 2020

21 December 2020

Ruining He

Anirudh Ravula

Bhargav Kanagal

Joshua Ainslie

ArXiv (abs)PDF HTML Github (35624★)

Abstract

Transformer is the backbone of modern NLP models. In this paper, we propose Informer, a simple architecture that significantly outperforms canonical Transformers on a spectrum of tasks including Masked Language Modeling, GLUE, and SQuAD. Qualitatively, Informer is easy to implement and requires minimal hyper-parameter tuning. It also stabilizes training and leads to models with sparser attentions. Code will be open-sourced upon paper acceptance.

View on arXiv

Comments on this paper