Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.01906
Cited By
Simplifying Transformer Blocks
3 November 2023
Bobby He
Thomas Hofmann
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Simplifying Transformer Blocks"
15 / 15 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
74
0
0
26 Apr 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
28
0
0
12 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
81
2
0
02 Oct 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon
Hamza Tahir Chaudhry
C. Pehlevan
AI4CE
42
9
0
24 May 2024
Transformer tricks: Removing weights for skipless transformers
Nils Graef
30
2
0
18 Apr 2024
Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer
Mingxuan Liu
Jiankai Tang
Haoxiang Li
Jiahao Qi
Siwei Li
Kegang Wang
Yuntao wang
Hong Chen
Yuntao Wang
Hong Chen
89
14
0
07 Feb 2024
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
23
0
0
11 Jan 2024
Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping
James Martens
Andy Ballard
Guillaume Desjardins
G. Swirszcz
Valentin Dalibard
Jascha Narain Sohl-Dickstein
S. Schoenholz
83
43
0
05 Oct 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
510
0
11 Feb 2021
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
X. Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian-jun Sun
117
1,531
0
11 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
220
347
0
14 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1