Undivided Attention: Are Intermediate Layers Necessary for BERT?

22 December 2020

Papers citing "Undivided Attention: Are Intermediate Layers Necessary for BERT?"

6 / 6 papers shown

Title
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models Isaac Gerber 29 0 0 10 May 2025
Similarity of Neural Network Models: A Survey of Functional and Representational Measures Max Klabunde Tobias Schumacher M. Strohmaier Florian Lemmerich 47 64 0 10 May 2023
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Jason Phang Haokun Liu Samuel R. Bowman 22 25 0 17 Sep 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei Ming Zhou 221 197 0 07 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,817 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,950 0 20 Apr 2018