Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.08274
Cited By
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
15 August 2024
Qizhen Zhang
Nikolas Gritsch
Dwaraknath Gnaneshwar
Simon Guo
David Cairuz
Bharat Venkitesh
Jakob N. Foerster
Phil Blunsom
Sebastian Ruder
A. Ustun
Acyr F. Locatelli
MoMe
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts"
5 / 5 papers shown
Title
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
S. Shi
Xiaowen Chu
30
6
0
07 Nov 2023
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
90
42
0
11 Oct 2022
The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications
Mirac Suzgun
Luke Melas-Kyriazi
Suproteem K. Sarkar
S. Kominers
Stuart M. Shieber
30
24
0
08 Jul 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
127
94
0
01 Jul 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1