Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Home
Papers
2311.02265
Cited By
v1
v2 (latest)
Not all layers are equally as important: Every Layer Counts BERT
3 November 2023
Lucas Georges Gabriel Charpentier
David Samuel
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Not all layers are equally as important: Every Layer Counts BERT"
14 / 14 papers shown
Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
Francesca Padovani
Bastian Bunzeck
Manar Ali
Omar Momen
Arianna Bisazza
Hendrik Buschmeier
Sina Zarrieß
ALM
104
0
0
23 Oct 2025
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
Jaap Jumelet
Abdellah Fourtassi
Akari Haga
Bastian Bunzeck
Bhargav Shandilya
...
Yurii Paniv
Ziyin Zhang
Arianna Bisazza
Alex Warstadt
Leshem Choshen
116
1
0
11 Oct 2025
Masked Diffusion Language Models with Frequency-Informed Training
Despoina Kosmopoulou
Efthymios Georgiou
Vaggelis Dorovatas
Georgios Paraskevopoulos
Alexandros Potamianos
85
1
0
05 Sep 2025
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning
Wesley Scivetti
Tatsuya Aoyama
Ethan Wilcox
Nathan Schneider
185
4
0
04 Jun 2025
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Lennart Stöpler
Rufat Asadli
Mitja Nikolaus
Robert Bamler
Alex Warstadt
LRM
281
3
0
09 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Robert Bamler
596
166
0
10 Apr 2025
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos
Charl Hendriks
Hishaam Mahomed
Francois Meyer
289
0
0
08 Jan 2025
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
335
19
0
31 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Robert Bamler
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
479
38
0
06 Dec 2024
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Zébulon Goriely
Richard Diehl Martinez
Andrew Caines
Lisa Beinborn
P. Buttery
CLL
282
8
0
30 Oct 2024
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization
Zilong Li
199
0
0
19 Oct 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Boyao Wang
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
381
92
0
26 Mar 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
François Fleuret
Martin Jaggi
250
15
0
04 Feb 2024
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
242
15
0
05 Oct 2023
1