Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.17312
Cited By
BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data
25 September 2024
J. Tastet
I. Timiryasov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data"
3 / 3 papers shown
Title
Pretraining Language Models for Diachronic Linguistic Change Discovery
Elisabeth Fittschen
Sabrina Li
Tom Lippincott
Leshem Choshen
Craig Messner
26
0
0
07 Apr 2025
CoSMoEs: Compact Sparse Mixture of Experts
Patrick Huber
Akshat Shrivastava
Ernie Chang
Chinnadhurai Sankar
Ahmed Aly
Adithya Sagar
MoE
29
0
0
28 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
91
7
0
06 Dec 2024
1