Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.03252
Cited By
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
7 October 2021
Kyuhong Shim
Iksoo Choi
Wonyong Sung
Jungwook Choi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling"
1 / 1 papers shown
Title
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
29
0
0
29 Apr 2025
1