ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08442
  4. Cited By
The MiniPile Challenge for Data-Efficient Language Models

The MiniPile Challenge for Data-Efficient Language Models

17 April 2023
Jean Kaddour
    MoE
    ALM
ArXivPDFHTML

Papers citing "The MiniPile Challenge for Data-Efficient Language Models"

41 / 41 papers shown
Title
RWKV-X: A Linear Complexity Hybrid Language Model
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
78
0
0
30 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
106
0
10 Apr 2025
Repetitions are not all alike: distinct mechanisms sustain repetition in language models
Repetitions are not all alike: distinct mechanisms sustain repetition in language models
Matéo Mahaut
Francesca Franzon
31
0
0
01 Apr 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
75
1
0
26 Feb 2025
Tokenization is Sensitive to Language Variation
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
75
1
0
24 Feb 2025
Practical Principles for AI Cost and Compute Accounting
Practical Principles for AI Cost and Compute Accounting
Stephen Casper
Luke Bailey
Tim Schreier
41
0
0
21 Feb 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
Seokjin Go
Divya Mahajan
MoE
62
0
0
10 Feb 2025
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
34
3
0
18 Oct 2024
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
David Grangier
Simin Fan
Skyler Seto
Pierre Ablin
34
3
0
30 Sep 2024
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review
Neha Prakriya
Jui-Nan Yen
Cho-Jui Hsieh
Jason Cong
KELM
AI4CE
LRM
26
1
0
10 Sep 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
47
27
0
22 Jul 2024
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill
  and Extreme KV-Cache Compression
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
Daniel Goldstein
Fares Obeid
Eric Alcaide
Guangyu Song
Eugene Cheah
VLM
AI4TS
24
7
0
16 Jul 2024
A Review of the Challenges with Massive Web-mined Corpora Used in Large
  Language Models Pre-Training
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training
Michał Perełkiewicz
Rafał Poświata
22
1
0
10 Jul 2024
Efficient Training of Language Models with Compact and Consistent Next
  Token Distributions
Efficient Training of Language Models with Compact and Consistent Next Token Distributions
Ashutosh Sathe
Sunita Sarawagi
27
0
0
03 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
37
11
0
01 Jul 2024
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
Wenhao Li
Mingbao Lin
Yunshan Zhong
Shuicheng Yan
Rongrong Ji
28
0
0
26 Jun 2024
Are We Done with MMLU?
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
52
29
0
06 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit
  Regularization
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
32
6
0
31 May 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
  Series
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Ge Zhang
Scott Qu
Jiaheng Liu
Chenchen Zhang
Chenghua Lin
...
Zi-Kai Zhao
Jiajun Zhang
Wanli Ouyang
Wenhao Huang
Wenhu Chen
ELM
38
44
0
29 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
Layer-Condensed KV Cache for Efficient Inference of Large Language
  Models
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Haoyi Wu
Kewei Tu
MQ
41
17
0
17 May 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
33
14
0
02 Mar 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
33
27
0
28 Feb 2024
Retrieval is Accurate Generation
Retrieval is Accurate Generation
Bowen Cao
Deng Cai
Leyang Cui
Xuxin Cheng
Wei Bi
Yuexian Zou
Shuming Shi
30
6
0
27 Feb 2024
LLMGuard: Guarding Against Unsafe LLM Behavior
LLMGuard: Guarding Against Unsafe LLM Behavior
Shubh Goyal
Medha Hira
Shubham Mishra
Sukriti Goyal
Arnav Goel
Niharika Dadu
DB Kirushikesh
Sameep Mehta
Nishtha Madaan
AILaw
27
7
0
27 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
29
8
0
22 Feb 2024
Analysing The Impact of Sequence Composition on Language Model
  Pre-Training
Analysing The Impact of Sequence Composition on Language Model Pre-Training
Yu Zhao
Yuanbin Qu
Konrad Staniszewski
Szymon Tworkowski
Wei Liu
Piotr Milo's
Yuxiang Wu
Pasquale Minervini
29
13
0
21 Feb 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
DYAD: A Descriptive Yet Abjuring Density efficient approximation to
  linear neural network layers
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
25
0
0
11 Dec 2023
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency
  by Automatic Task Formation
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
Jaap Jumelet
Michael Hanna
Marianne de Heer Kloots
Anna Langedijk
Charlotte Pouw
Oskar van der Wal
13
3
0
17 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in
  Data Pruning
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
14
28
0
11 Oct 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data
  Age, Domain Coverage, Quality, & Toxicity
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
16
145
0
22 May 2023
Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Aengus Lynch
G. Dovonon
Jean Kaddour
Ricardo M. A. Silva
184
30
0
09 Mar 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural
  Networks
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Jason Eshraghian
BDL
VLM
24
80
0
27 Feb 2023
Pile of Law: Learning Responsible Data Filtering from the Law and a
  256GB Open-Source Legal Dataset
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
24
44
0
07 Nov 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1