Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16753
Cited By
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture
27 March 2023
Peiyu Liu
Ze-Feng Gao
Yushuo Chen
Wayne Xin Zhao
Ji-Rong Wen
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture"
3 / 3 papers shown
Title
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
148
345
0
23 Jul 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
1