Scaling Laws and Interpretability of Learning from Repeated Data

21 May 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Scaling Laws and Interpretability of Learning from Repeated Data"

46 / 96 papers shown

Poro 34B and the Blessing of Multilinguality

316

02 Apr 2024

Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding

Lung-Chuan Chen

Zong-Ru Li

ALM

273

01 Apr 2024

ROME: Memorization Insights from Text, Logits and Representation

Bo Li

Qing Xia Zhao

Lijie Wen

242

01 Mar 2024

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

358

29 Feb 2024

Large Language Models: A Survey

855

789

09 Feb 2024

Scaling Laws for Downstream Task Performance of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

319

06 Feb 2024

On Catastrophic Inheritance of Large Foundation Models

Hao Chen

Bhiksha Raj

Xing Xie

Yongfeng Zhang

AI4CE

296

02 Feb 2024

Rethinking Interpretability in the Era of Large Language Models

300

112

30 Jan 2024

Generative Deduplication For Socia Media Data SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xianming Li

Yuqun Zhang

254

11 Jan 2024

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

...

Qi Li

337

101

11 Jan 2024

Understanding LLMs: A Comprehensive Overview from Training to Inference

...

Tuo Zhang

Tianming Liu

465

125

04 Jan 2024

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the UglyHigh-Confidence Computing (HC), 2023

624

950

04 Dec 2023

Data Management For Large Language Models: A Survey

Lifeng Shang

Xin Jiang

Qun Liu

LM&MA

241

04 Dec 2023

The Disagreement Problem in Faithfulness Metrics

191

13 Nov 2023

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

...

458

1,998

09 Nov 2023

Data Factors for Better Compositional Generalization

Xiang Zhou

Yichen Jiang

Mohit Bansal

CoGe OOD

200

08 Nov 2023

The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence

Noam Levi

Yaron Oz

AI4CE

282

02 Nov 2023

Skywork: A More Open Bilingual Foundation Model

...

275

121

30 Oct 2023

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

295

19 Oct 2023

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 LanguagesInternational Conference on Language Resources and Evaluation (LREC), 2023

247

160

17 Sep 2023

Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Haiyan Zhao

Hanjie Chen

Fan Yang

Ninghao Liu

500

710

02 Sep 2023

Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models

Wei Zhang

199

27 Aug 2023

Considerations for health care institutions training large language models on electronic health records

Danielle Bitterman

24 Aug 2023

D4: Improving LLM Pretraining via Document De-Duplication and DiversificationNeural Information Processing Systems (NeurIPS), 2023

194

151

23 Aug 2023

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Bo Wang

329

20 Jul 2023

The semantic landscape paradigm for neural networks

Shreyas Gokhale

304

18 Jul 2023

Beyond Implicit Bias: The Insignificance of SGD Noise in Online LearningInternational Conference on Machine Learning (ICML), 2023

271

14 Jun 2023

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Guilherme Penedo

Quentin Malartic

Daniel Hesslow

Ruxandra-Aimée Cojocaru

425

890

01 Jun 2023

Scaling Data-Constrained Language ModelsNeural Information Processing Systems (NeurIPS), 2023

703

329

25 May 2023

Selective Pre-training for Private Fine-tuning

422

23 May 2023

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-CrisisNeural Information Processing Systems (NeurIPS), 2023

Yang You

312

120

22 May 2023

Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023

...

326

21 May 2023

Advancing underwater acoustic target recognition via adaptive data pruning and smoothness-inducing regularization

Yuan Xie

Tianyu Chen

Ji Xu

184

24 Apr 2023

Emergent and Predictable Memorization in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023

278

167

21 Apr 2023

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual PretrainingInternational Conference on Learning Representations (ICLR), 2023

Sharan Narang

283

101

18 Apr 2023

The MiniPile Challenge for Data-Efficient Language Models

Jean Kaddour

MoE ALM

332

17 Apr 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023

...

396

1,641

03 Apr 2023

Language Model Behavior: A Comprehensive SurveyInternational Conference on Computational Logic (ICCL), 2023

Tyler A. Chang

Benjamin Bergen

VLM LRM LM&MA

381

141

20 Mar 2023

Data Selection for Language Models via Importance ResamplingNeural Information Processing Systems (NeurIPS), 2023

559

279

06 Feb 2023

Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022

Jonas Geiping

Tom Goldstein

MoE

274

103

28 Dec 2022

Training Trajectories of Language Models Across ScalesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Luke Zettlemoyer

269

19 Dec 2022

The Stack: 3 TB of permissively licensed source code

...

245

410

20 Nov 2022

Galactica: A Large Language Model for Science

396

937

16 Nov 2022

A Solvable Model of Neural Scaling Laws

A. Maloney

Daniel A. Roberts

J. Sully

262

30 Oct 2022

Transcending Scaling Laws with 0.1% Extra ComputeConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

...

314

20 Oct 2022

Deduplicating Training Data Mitigates Privacy Risks in Language ModelsInternational Conference on Machine Learning (ICML), 2022

577

366

14 Feb 2022