Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.10487
Cited By
Scaling Laws and Interpretability of Learning from Repeated Data
21 May 2022
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
T. Henighan
Tristan Hume
Scott R. Johnston
Benjamin Mann
C. Olah
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Scaling Laws and Interpretability of Learning from Repeated Data"
46 / 96 papers shown
Poro 34B and the Blessing of Multilinguality
Risto Luukkonen
Jonathan Burdge
Elaine Zosa
Aarne Talman
Ville Komulainen
Vaino Hatanpaa
Peter Sarlin
S. Pyysalo
AI4CE
316
19
0
02 Apr 2024
Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding
Lung-Chuan Chen
Zong-Ru Li
ALM
273
1
0
01 Apr 2024
ROME: Memorization Insights from Text, Logits and Representation
Bo Li
Qing Xia Zhao
Lijie Wen
242
7
0
01 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
358
63
0
29 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
855
789
0
09 Feb 2024
Scaling Laws for Downstream Task Performance of Large Language Models
International Conference on Learning Representations (ICLR), 2024
Berivan Isik
Natalia Ponomareva
Hussein Hazimeh
Dimitris Paparas
Sergei Vassilvitskii
Sanmi Koyejo
319
49
0
06 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Yongfeng Zhang
AI4CE
296
14
0
02 Feb 2024
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRM
AI4CE
300
112
0
30 Jan 2024
Generative Deduplication For Socia Media Data Selection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xianming Li
Yuqun Zhang
254
3
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
337
101
0
11 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
465
125
0
04 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
High-Confidence Computing (HC), 2023
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
624
950
0
04 Dec 2023
Data Management For Large Language Models: A Survey
Zige Wang
Wanjun Zhong
Yufei Wang
Qi Zhu
Fei Mi
Baojun Wang
Lifeng Shang
Xin Jiang
Qun Liu
LM&MA
241
13
0
04 Dec 2023
The Disagreement Problem in Faithfulness Metrics
Brian Barr
Noah Fatsi
Leif Hancox-Li
Peter Richter
Daniel Proano
Caleb Mok
191
8
0
13 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
458
1,998
0
09 Nov 2023
Data Factors for Better Compositional Generalization
Xiang Zhou
Yichen Jiang
Mohit Bansal
CoGe
OOD
200
7
0
08 Nov 2023
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Noam Levi
Yaron Oz
AI4CE
282
1
0
02 Nov 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
275
121
0
30 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
295
19
0
19 Oct 2023
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
International Conference on Language Resources and Evaluation (LREC), 2023
Thuat Nguyen
Chien Van Nguyen
Viet Dac Lai
Hieu Man
Nghia Trung Ngo
Franck Dernoncourt
Ryan Rossi
Thien Huu Nguyen
247
160
0
17 Sep 2023
Explainability for Large Language Models: A Survey
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
710
0
02 Sep 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Ran Bi
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
199
14
0
27 Aug 2023
Considerations for health care institutions training large language models on electronic health records
Weipeng Zhou
Danielle Bitterman
Majid Afshar
Timothy A. Miller
LM&MA
97
0
0
24 Aug 2023
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Neural Information Processing Systems (NeurIPS), 2023
Kushal Tirumala
Daniel Simig
Armen Aghajanyan
Ari S. Morcos
SyDa
194
151
0
23 Aug 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
Michael Gunther
Louis Milliken
Jonathan Geuter
Georgios Mastrapas
Bo Wang
Han Xiao
RALM
329
46
0
20 Jul 2023
The semantic landscape paradigm for neural networks
Shreyas Gokhale
304
3
0
18 Jul 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
International Conference on Machine Learning (ICML), 2023
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
271
7
0
14 Jun 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
425
890
0
01 Jun 2023
Scaling Data-Constrained Language Models
Neural Information Processing Systems (NeurIPS), 2023
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
703
329
0
25 May 2023
Selective Pre-training for Private Fine-tuning
Da Yu
Sivakanth Gopi
Janardhan Kulkarni
Zinan Lin
Saurabh Naik
Tomasz Religa
Jian Yin
Huishuai Zhang
422
25
0
23 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Neural Information Processing Systems (NeurIPS), 2023
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
312
120
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
International Conference on Language Resources and Evaluation (LREC), 2023
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Amélie Reymond
LRM
326
8
0
21 May 2023
Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization
Yuan Xie
Tianyu Chen
Ji Xu
184
3
0
24 Apr 2023
Emergent and Predictable Memorization in Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
278
167
0
21 Apr 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
International Conference on Learning Representations (ICLR), 2023
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
283
101
0
18 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
332
64
0
17 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
International Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
396
1,641
0
03 Apr 2023
Language Model Behavior: A Comprehensive Survey
International Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
381
141
0
20 Mar 2023
Data Selection for Language Models via Importance Resampling
Neural Information Processing Systems (NeurIPS), 2023
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Abigail Z. Jacobs
559
279
0
06 Feb 2023
Cramming: Training a Language Model on a Single GPU in One Day
International Conference on Machine Learning (ICML), 2022
Jonas Geiping
Tom Goldstein
MoE
274
103
0
28 Dec 2022
Training Trajectories of Language Models Across Scales
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mengzhou Xia
Mikel Artetxe
Chunting Zhou
Xi Lin
Ramakanth Pasunuru
Danqi Chen
Luke Zettlemoyer
Ves Stoyanov
AIFin
LRM
269
71
0
19 Dec 2022
The Stack: 3 TB of permissively licensed source code
Denis Kocetkov
Raymond Li
Loubna Ben Allal
Jia Li
Chenghao Mou
...
Sean M. Hughes
Thomas Wolf
Dzmitry Bahdanau
Leandro von Werra
H. D. Vries
245
410
0
20 Nov 2022
Galactica: A Large Language Model for Science
Ross Taylor
Marcin Kardas
Guillem Cucurull
Thomas Scialom
Anthony Hartshorn
Elvis Saravia
Andrew Poulton
Viktor Kerkez
Robert Stojnic
ELM
ReLM
396
937
0
16 Nov 2022
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
262
78
0
30 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
314
73
0
20 Oct 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
International Conference on Machine Learning (ICML), 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
577
366
0
14 Feb 2022
Previous
1
2
Page 2 of 2