Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.11282
Cited By
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
17 October 2023
Jaap Jumelet
Michael Hanna
Marianne de Heer Kloots
Anna Langedijk
Charlotte Pouw
Oskar van der Wal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation"
8 / 8 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
106
0
10 Apr 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
46
1
0
24 Feb 2025
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization
Zilong Li
18
0
0
19 Oct 2024
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
54
76
0
03 Oct 2022
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
216
188
0
03 May 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1