Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.11881
Cited By
Undivided Attention: Are Intermediate Layers Necessary for BERT?
22 December 2020
S. N. Sridhar
Anthony Sarah
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Undivided Attention: Are Intermediate Layers Necessary for BERT?"
6 / 6 papers shown
Title
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
29
0
0
10 May 2025
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
47
64
0
10 May 2023
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers
Jason Phang
Haokun Liu
Samuel R. Bowman
22
25
0
17 Sep 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
1