Papers citing 'Not all layers are equally as important: Every Layer Counts BERT'

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)

104

0

23 Oct 2025

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

...

116

1

0

11 Oct 2025

Masked Diffusion Language Models with Frequency-Informed Training

Despoina Kosmopoulou

Efthymios Georgiou

Vaggelis Dorovatas

Georgios Paraskevopoulos

Alexandros Potamianos

85

1

0

05 Sep 2025

Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning

185

4

0

04 Jun 2025

Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models

281

3

0

09 May 2025

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

...

596

166

0

10 Apr 2025

BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context

289

0

08 Jan 2025

GPT or BERT: why not both?

Lucas Georges Gabriel Charpentier

David Samuel

335

19

0

31 Dec 2024

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

479

38

0

06 Dec 2024

From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes

Zébulon Goriely

Richard Diehl Martinez

282

8

0

30 Oct 2024

Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization

Zilong Li

199

0

19 Oct 2024

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

Tong Zhang

381

92

0

26 Mar 2024

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Matteo Pagliardini

Amirkeivan Mohtashami

François Fleuret

Martin Jaggi

250

15

0

04 Feb 2024

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

242

15

0

05 Oct 2023

All Papers

Not all layers are equally as important: Every Layer Counts BERT

Papers citing "Not all layers are equally as important: Every Layer Counts BERT"

Not all layers are equally as important: Every Layer Counts BERT

Papers citing "Not all layers are equally as important: Every Layer Counts BERT"