BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data

25 September 2024

Papers citing "BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data"

3 / 3 papers shown

Title
Pretraining Language Models for Diachronic Linguistic Change Discovery Elisabeth Fittschen Sabrina Li Tom Lippincott Leshem Choshen Craig Messner 26 0 0 07 Apr 2025
CoSMoEs: Compact Sparse Mixture of Experts Patrick Huber Akshat Shrivastava Ernie Chang Chinnadhurai Sankar Ahmed Aly Adithya Sagar MoE 29 0 0 28 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Michael Y. Hu Aaron Mueller Candace Ross Adina Williams Tal Linzen Chengxu Zhuang Ryan Cotterell Leshem Choshen Alex Warstadt Ethan Gotlieb Wilcox 91 7 0 06 Dec 2024