ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.10487
  4. Cited By
Scaling Laws and Interpretability of Learning from Repeated Data

Scaling Laws and Interpretability of Learning from Repeated Data

21 May 2022
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
T. Henighan
Tristan Hume
Scott R. Johnston
Benjamin Mann
C. Olah
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Scaling Laws and Interpretability of Learning from Repeated Data"

46 / 96 papers shown
Poro 34B and the Blessing of Multilinguality
Poro 34B and the Blessing of Multilinguality
Risto Luukkonen
Jonathan Burdge
Elaine Zosa
Aarne Talman
Ville Komulainen
Vaino Hatanpaa
Peter Sarlin
S. Pyysalo
AI4CE
316
19
0
02 Apr 2024
Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie
  Embedding
Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding
Lung-Chuan Chen
Zong-Ru Li
ALM
273
1
0
01 Apr 2024
ROME: Memorization Insights from Text, Logits and Representation
ROME: Memorization Insights from Text, Logits and Representation
Bo Li
Qing Xia Zhao
Lijie Wen
242
7
0
01 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
358
63
0
29 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALMLM&MAELM
855
789
0
09 Feb 2024
Scaling Laws for Downstream Task Performance of Large Language Models
Scaling Laws for Downstream Task Performance of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Berivan Isik
Natalia Ponomareva
Hussein Hazimeh
Dimitris Paparas
Sergei Vassilvitskii
Sanmi Koyejo
319
49
0
06 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Yongfeng Zhang
AI4CE
296
14
0
02 Feb 2024
Rethinking Interpretability in the Era of Large Language Models
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRMAI4CE
300
112
0
30 Jan 2024
Generative Deduplication For Socia Media Data Selection
Generative Deduplication For Socia Media Data SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xianming Li
Yuqun Zhang
254
3
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
337
101
0
11 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
465
125
0
04 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the UglyHigh-Confidence Computing (HC), 2023
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILMELM
624
950
0
04 Dec 2023
Data Management For Large Language Models: A Survey
Data Management For Large Language Models: A Survey
Zige Wang
Wanjun Zhong
Yufei Wang
Qi Zhu
Fei Mi
Baojun Wang
Lifeng Shang
Xin Jiang
Qun Liu
LM&MA
241
13
0
04 Dec 2023
The Disagreement Problem in Faithfulness Metrics
The Disagreement Problem in Faithfulness Metrics
Brian Barr
Noah Fatsi
Leif Hancox-Li
Peter Richter
Daniel Proano
Caleb Mok
191
8
0
13 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
458
1,998
0
09 Nov 2023
Data Factors for Better Compositional Generalization
Data Factors for Better Compositional Generalization
Xiang Zhou
Yichen Jiang
Mohit Bansal
CoGeOOD
200
7
0
08 Nov 2023
The Universal Statistical Structure and Scaling Laws of Chaos and
  Turbulence
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Noam Levi
Yaron Oz
AI4CE
282
1
0
02 Nov 2023
Skywork: A More Open Bilingual Foundation Model
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
275
121
0
30 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on
  corrupted algorithmic datasets
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
295
19
0
19 Oct 2023
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
  Language Models in 167 Languages
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 LanguagesInternational Conference on Language Resources and Evaluation (LREC), 2023
Thuat Nguyen
Chien Van Nguyen
Viet Dac Lai
Hieu Man
Nghia Trung Ngo
Franck Dernoncourt
Ryan Rossi
Thien Huu Nguyen
247
160
0
17 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
710
0
02 Sep 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Ran Bi
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MASyDa
199
14
0
27 Aug 2023
Considerations for health care institutions training large language
  models on electronic health records
Considerations for health care institutions training large language models on electronic health records
Weipeng Zhou
Danielle Bitterman
Majid Afshar
Timothy A. Miller
LM&MA
97
0
0
24 Aug 2023
D4: Improving LLM Pretraining via Document De-Duplication and
  Diversification
D4: Improving LLM Pretraining via Document De-Duplication and DiversificationNeural Information Processing Systems (NeurIPS), 2023
Kushal Tirumala
Daniel Simig
Armen Aghajanyan
Ari S. Morcos
SyDa
194
151
0
23 Aug 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding
  Models
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
Michael Gunther
Louis Milliken
Jonathan Geuter
Georgios Mastrapas
Bo Wang
Han Xiao
RALM
329
46
0
20 Jul 2023
The semantic landscape paradigm for neural networks
The semantic landscape paradigm for neural networks
Shreyas Gokhale
304
3
0
18 Jul 2023
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Beyond Implicit Bias: The Insignificance of SGD Noise in Online LearningInternational Conference on Machine Learning (ICML), 2023
Nikhil Vyas
Depen Morwani
Rosie Zhao
Gal Kaplun
Sham Kakade
Boaz Barak
MLT
271
7
0
14 Jun 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
425
890
0
01 Jun 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
703
329
0
25 May 2023
Selective Pre-training for Private Fine-tuning
Selective Pre-training for Private Fine-tuning
Da Yu
Sivakanth Gopi
Janardhan Kulkarni
Zinan Lin
Saurabh Naik
Tomasz Religa
Jian Yin
Huishuai Zhang
422
25
0
23 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-CrisisNeural Information Processing Systems (NeurIPS), 2023
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
312
120
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Amélie Reymond
LRM
326
8
0
21 May 2023
Advancing underwater acoustic target recognition via adaptive data
  pruning and smoothness-inducing regularization
Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization
Yuan Xie
Tianyu Chen
Ji Xu
184
3
0
24 Apr 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
278
167
0
21 Apr 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale
  Multilingual Pretraining
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual PretrainingInternational Conference on Learning Representations (ICLR), 2023
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
283
101
0
18 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoEALM
332
64
0
17 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
396
1,641
0
03 Apr 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive SurveyInternational Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLMLRMLM&MA
381
141
0
20 Mar 2023
Data Selection for Language Models via Importance Resampling
Data Selection for Language Models via Importance ResamplingNeural Information Processing Systems (NeurIPS), 2023
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Abigail Z. Jacobs
559
279
0
06 Feb 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022
Jonas Geiping
Tom Goldstein
MoE
274
103
0
28 Dec 2022
Training Trajectories of Language Models Across Scales
Training Trajectories of Language Models Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mengzhou Xia
Mikel Artetxe
Chunting Zhou
Xi Lin
Ramakanth Pasunuru
Danqi Chen
Luke Zettlemoyer
Ves Stoyanov
AIFinLRM
269
71
0
19 Dec 2022
The Stack: 3 TB of permissively licensed source code
The Stack: 3 TB of permissively licensed source code
Denis Kocetkov
Raymond Li
Loubna Ben Allal
Jia Li
Chenghao Mou
...
Sean M. Hughes
Thomas Wolf
Dzmitry Bahdanau
Leandro von Werra
H. D. Vries
245
410
0
20 Nov 2022
Galactica: A Large Language Model for Science
Galactica: A Large Language Model for Science
Ross Taylor
Marcin Kardas
Guillem Cucurull
Thomas Scialom
Anthony Hartshorn
Elvis Saravia
Andrew Poulton
Viktor Kerkez
Robert Stojnic
ELMReLM
396
937
0
16 Nov 2022
A Solvable Model of Neural Scaling Laws
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
262
78
0
30 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Transcending Scaling Laws with 0.1% Extra ComputeConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
314
73
0
20 Oct 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Deduplicating Training Data Mitigates Privacy Risks in Language ModelsInternational Conference on Machine Learning (ICML), 2022
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILMMU
577
366
0
14 Feb 2022
Previous
12
Page 2 of 2