ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.06103
  4. Cited By
Composable Function-preserving Expansions for Transformer Architectures

Composable Function-preserving Expansions for Transformer Architectures

11 August 2023
Andrea Gesmundo
Kaitlin Maile
    AI4CE
ArXivPDFHTML

Papers citing "Composable Function-preserving Expansions for Transformer Architectures"

9 / 9 papers shown
Title
Upcycling Large Language Models into Mixture of Experts
Upcycling Large Language Models into Mixture of Experts
Ethan He
Abhinav Khattar
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
M. Shoeybi
Bryan Catanzaro
MoE
17
9
0
10 Oct 2024
MCSD: An Efficient Language Model with Diverse Fusion
MCSD: An Efficient Language Model with Diverse Fusion
Hua Yang
Duohai Li
Shiman Li
27
2
0
18 Jun 2024
Predicting the Impact of Model Expansion through the Minima Manifold: A
  Loss Landscape Perspective
Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
Pranshu Malviya
Jerry Huang
Quentin Fournier
Sarath Chandar
54
0
0
24 May 2024
Efficient Stagewise Pretraining via Progressive Subnetworks
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
23
5
0
08 Feb 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALM
ELM
11
130
0
23 Dec 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
23
2
0
06 Nov 2023
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from
  a Parametric Perspective
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
21
8
0
17 Oct 2023
Masked Structural Growth for 2x Faster Language Model Pre-training
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao
Zheng-Wei Zhang
Jing Li
Yequan Wang
OffRL
AI4CE
LRM
40
15
0
04 May 2023
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1