ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.10770
  4. Cited By
Memorization Without Overfitting: Analyzing the Training Dynamics of
  Large Language Models

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

22 May 2022
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
    TDI
ArXivPDFHTML

Papers citing "Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models"

49 / 49 papers shown
Title
Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models
Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models
John Hawkins
ReLM
LRM
48
0
0
08 May 2025
Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Y. Hong
Xiaofei Yin
Xinzhong Wang
Yi Tu
Ya Guo
Sufeng Duan
Weiqiang Wang
Lingyong Fang
Depeng Wang
Huijia Zhu
CLL
89
0
0
27 Apr 2025
Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$-Constrained Optimization
Controllable Unlearning for Image-to-Image Generative Models via ε\varepsilonε-Constrained Optimization
Xiaohua Feng
Chao-Jun Chen
Yuyuan Li
L. Zhang
Longfei Li
Jun Zhou
Xiaolin Zheng
MU
68
0
0
20 Feb 2025
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang
Adam Dziedzic
Grace C. Kim
Michael Backes
Franziska Boenisch
81
0
0
11 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
93
7
0
10 Feb 2025
Soup to go: mitigating forgetting during continual learning with model averaging
Soup to go: mitigating forgetting during continual learning with model averaging
Anat Kleiman
Gintare Karolina Dziugaite
Jonathan Frankle
Sham Kakade
Mansheej Paul
MoMe
CLL
KELM
51
0
0
09 Jan 2025
Are Large Language Models Memorizing Bug Benchmarks?
Are Large Language Models Memorizing Bug Benchmarks?
Daniel Ramos
Claudia Mamede
Kush Jain
Paulo Canelas
Catarina Gamboa
Claire Le Goues
PILM
ELM
94
6
0
20 Nov 2024
On Memorization of Large Language Models in Logical Reasoning
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
45
20
0
30 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Continual Learning: Less Forgetting, More OOD Generalization via
  Adaptive Contrastive Replay
Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay
Hossein Rezaei
Mohammad Sabokrou
CLL
21
0
0
09 Oct 2024
Data Selection via Optimal Control for Language Models
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
48
4
0
09 Oct 2024
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
43
1
0
04 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
69
7
0
03 Oct 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILM
SyDa
35
1
0
12 Sep 2024
LLM-based multi-agent poetry generation in non-cooperative environments
LLM-based multi-agent poetry generation in non-cooperative environments
Ran Zhang
Steffen Eger
LLMAG
31
5
0
05 Sep 2024
Range Membership Inference Attacks
Range Membership Inference Attacks
Jiashu Tao
Reza Shokri
40
1
0
09 Aug 2024
Strong Copyright Protection for Language Models via Adaptive Model
  Fusion
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
35
4
0
29 Jul 2024
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Haoyu Tang
Ye Liu
Xukai Liu
Xukai Liu
Yanghai Zhang
Kai Zhang
Xiaofang Zhou
Enhong Chen
MU
67
3
0
25 Jul 2024
How Do Large Language Models Acquire Factual Knowledge During
  Pretraining?
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
33
30
0
17 Jun 2024
Quantifying In-Context Reasoning Effects and Memorization Effects in
  LLMs
Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
Siyu Lou
Yuntian Chen
Xiaodan Liang
Liang Lin
Quanshi Zhang
32
2
0
20 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
31
6
0
14 May 2024
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning
  in Large Language Models
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
George-Octavian Barbulescu
Peter Triantafillou
MU
29
16
0
06 May 2024
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
William Fleshman
Aleem Khan
Marc Marone
Benjamin Van Durme
CLL
KELM
44
3
0
12 Apr 2024
GP-MoLFormer: A Foundation Model For Molecular Generation
GP-MoLFormer: A Foundation Model For Molecular Generation
Jerret Ross
Brian M. Belgodere
Samuel C. Hoffman
Vijil Chenthamarakshan
Youssef Mroueh
Payel Das
Payel Das
31
5
0
04 Apr 2024
Fundamental Limits of Membership Inference Attacks on Machine Learning Models
Fundamental Limits of Membership Inference Attacks on Machine Learning Models
Eric Aubinais
Elisabeth Gassiat
Pablo Piantanida
MIACV
48
2
0
20 Oct 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
66
701
0
19 Sep 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
23
13
0
31 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
28
25
0
20 Jul 2023
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft
  Prompting and Calibrated Confidence Estimation
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang
Jiaxin Wen
Minlie Huang
22
29
0
10 Jul 2023
On The Impact of Machine Learning Randomness on Group Fairness
On The Impact of Machine Learning Randomness on Group Fairness
Prakhar Ganesh
Hong Chang
Martin Strobel
Reza Shokri
FaML
16
30
0
09 Jul 2023
Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Anvith Thudi
Hengrui Jia
Casey Meehan
Ilia Shumailov
Nicolas Papernot
12
3
0
01 Jul 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
19
116
0
21 Apr 2023
Towards Generating Functionally Correct Code Edits from Natural Language
  Issue Descriptions
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Sarah Fakhoury
Saikat Chakraborty
Madan Musuvathi
Shuvendu K. Lahiri
25
21
0
07 Apr 2023
Recognition, recall, and retention of few-shot memories in large
  language models
Recognition, recall, and retention of few-shot memories in large language models
A. Orhan
LRM
KELM
CLL
27
3
0
30 Mar 2023
Bounding Training Data Reconstruction in DP-SGD
Bounding Training Data Reconstruction in DP-SGD
Jamie Hayes
Saeed Mahloujifar
Borja Balle
AAML
FedML
21
39
0
14 Feb 2023
Finding Memo: Extractive Memorization in Constrained Sequence Generation
  Tasks
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks
Vikas Raunak
Arul Menezes
30
13
0
24 Oct 2022
Understanding Transformer Memorization Recall Through Idioms
Understanding Transformer Memorization Recall Through Idioms
Adi Haviv
Ido Cohen
Jacob Gidron
R. Schuster
Yoav Goldberg
Mor Geva
24
48
0
07 Oct 2022
Measuring Forgetting of Memorized Training Examples
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
40
102
0
30 Jun 2022
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
  Multilingual Language Models
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models
Terra Blevins
Hila Gonen
Luke Zettlemoyer
LRM
54
26
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,448
0
28 Jan 2022
Training Dynamics for Text Summarization Models
Training Dynamics for Text Summarization Models
Tanya Goyal
Jiacheng Xu
J. Li
Greg Durrett
59
29
0
15 Oct 2021
How BPE Affects Memorization in Transformers
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
161
32
0
06 Oct 2021
Word Acquisition in Neural Language Models
Word Acquisition in Neural Language Models
Tyler A. Chang
Benjamin Bergen
27
39
0
05 Oct 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
590
0
14 Jul 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,774
0
24 Feb 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,812
0
14 Dec 2020
When is Memorization of Irrelevant Training Data Necessary for
  High-Accuracy Learning?
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?
Gavin Brown
Mark Bun
Vitaly Feldman
Adam D. Smith
Kunal Talwar
245
80
0
11 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,584
0
03 Sep 2019
1