On the Sub-Layer Functionalities of Transformer Decoder

6 October 2020

Papers citing "On the Sub-Layer Functionalities of Transformer Decoder"

4 / 4 papers shown

Title
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva Avi Caciularu Ke Wang Yoav Goldberg KELM 34 333 0 28 Mar 2022
A Survey of Transformers Tianyang Lin Yuxin Wang Xiangyang Liu Xipeng Qiu ViT 27 1,084 0 08 Jun 2021
How Does Selective Mechanism Improve Self-Attention Networks? Xinwei Geng Longyue Wang Xing Wang Bing Qin Ting Liu Zhaopeng Tu AAML 34 35 0 03 May 2020
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 882 0 03 May 2018