Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.06103
Cited By
Composable Function-preserving Expansions for Transformer Architectures
11 August 2023
Andrea Gesmundo
Kaitlin Maile
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Composable Function-preserving Expansions for Transformer Architectures"
9 / 9 papers shown
Title
Upcycling Large Language Models into Mixture of Experts
Ethan He
Abhinav Khattar
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
M. Shoeybi
Bryan Catanzaro
MoE
17
9
0
10 Oct 2024
MCSD: An Efficient Language Model with Diverse Fusion
Hua Yang
Duohai Li
Shiman Li
27
2
0
18 Jun 2024
Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
Pranshu Malviya
Jerry Huang
Quentin Fournier
Sarath Chandar
54
0
0
24 May 2024
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
23
5
0
08 Feb 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALM
ELM
11
130
0
23 Dec 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
23
2
0
06 Nov 2023
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
21
8
0
17 Oct 2023
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao
Zheng-Wei Zhang
Jing Li
Yequan Wang
OffRL
AI4CE
LRM
40
15
0
04 May 2023
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1