Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.05271
Cited By
Does Learning Require Memorization? A Short Tale about a Long Tail
12 June 2019
Vitaly Feldman
TDI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Does Learning Require Memorization? A Short Tale about a Long Tail"
50 / 98 papers shown
Title
DMRL: Data- and Model-aware Reward Learning for Data Extraction
Zhiqiang Wang
Ruoxi Cheng
28
0
0
07 May 2025
Measuring Déjà vu Memorization Efficiently
Narine Kokhlikyan
Bargav Jayaraman
Florian Bordes
Chuan Guo
Kamalika Chaudhuri
30
1
0
08 Apr 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min-Bin Lin
Ye Wang
DiffM
TDI
166
43
0
21 Feb 2025
Hallucination, Monofacts, and Miscalibration: An Empirical Investigation
Muqing Miao
Michael Kearns
59
0
0
11 Feb 2025
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Wenhao Wang
Adam Dziedzic
Grace C. Kim
Michael Backes
Franziska Boenisch
86
0
0
11 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
95
7
0
10 Feb 2025
FairDropout: Using Example-Tied Dropout to Enhance Generalization of Minority Groups
Géraldin Nanfack
Eugene Belilovsky
59
0
0
10 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
115
3
0
06 Feb 2025
The Silent Majority: Demystifying Memorization Effect in the Presence of Spurious Correlations
Chenyu You
Haocheng Dai
Yifei Min
Jasjeet Sekhon
S. Joshi
James S. Duncan
60
2
0
01 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
90
12
0
31 Dec 2024
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Yuqing Nie
Chong Wang
K. Wang
Guoai Xu
Guosheng Xu
Haoyu Wang
OffRL
130
1
0
11 Oct 2024
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li
Haoran Xu
Weiting Tan
Kenton Murray
Daniel Khashabi
35
1
0
06 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
71
7
0
03 Oct 2024
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
Geonhee Kim
Marco Valentino
André Freitas
LRM
AI4CE
28
7
0
16 Aug 2024
Range Membership Inference Attacks
Jiashu Tao
Reza Shokri
42
1
0
09 Aug 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth
Alvin Deng
Kyle O'Brien
Jyothir S V
Mohammad Aflah Khan
...
Jacob Ray Fuehne
Stella Biderman
Tracy Ke
Katherine Lee
Naomi Saphra
55
12
0
25 Jun 2024
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
37
30
0
17 Jun 2024
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
38
1
0
05 Jun 2024
Delving into Differentially Private Transformer
Youlong Ding
Xueyang Wu
Yining Meng
Yonggang Luo
Hao Wang
Weike Pan
29
5
0
28 May 2024
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding
Ken Ziyu Liu
Pura Peetathawatchai
Berivan Isik
Sanmi Koyejo
43
4
0
27 May 2024
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl
Kim Stachenfeld
NAI
CoGe
73
5
0
26 May 2024
Data Reconstruction: When You See It and When You Don't
Edith Cohen
Haim Kaplan
Yishay Mansour
Shay Moran
Kobbi Nissim
Uri Stemmer
Eliad Tsfadia
AAML
42
2
0
24 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
38
1
0
19 May 2024
To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
George-Octavian Barbulescu
Peter Triantafillou
MU
31
16
0
06 May 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
31
26
0
29 Feb 2024
Investigating Data Contamination for Pre-training Language Models
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
33
63
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
13
16
0
08 Dec 2023
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Ali Naseh
Jaechul Roh
Amir Houmansadr
DiffM
20
6
0
06 Dec 2023
Calibrated Language Models Must Hallucinate
Adam Tauman Kalai
Santosh Vempala
HILM
22
75
0
24 Nov 2023
Fundamental Limits of Membership Inference Attacks on Machine Learning Models
Eric Aubinais
Elisabeth Gassiat
Pablo Piantanida
MIACV
48
2
0
20 Oct 2023
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework
Imdad Ullah
Najm Hassan
S. Gill
Basem Suleiman
T. Ahanger
Zawar Shah
Junaid Qadir
S. Kanhere
35
16
0
19 Oct 2023
On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
Runqi Lin
Chaojian Yu
Bo Han
Tongliang Liu
22
7
0
13 Oct 2023
Samplable Anonymous Aggregation for Private Federated Data Analysis
Kunal Talwar
Shan Wang
Audra McMillan
Vojta Jina
Vitaly Feldman
...
Congzheng Song
Karl Tarbe
Sebastian Vogt
L. Winstrom
Shundong Zhou
FedML
30
13
0
27 Jul 2023
Privacy-Utility Trade-offs in Neural Networks for Medical Population Graphs: Insights from Differential Privacy and Graph Structure
Tamara T. Mueller
Maulik Chevli
Ameya Daigavane
Daniel Rueckert
Georgios Kaissis
25
0
0
13 Jul 2023
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang
Jiaxin Wen
Minlie Huang
30
29
0
10 Jul 2023
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses
G. Buzaglo
Niv Haim
Gilad Yehudai
Gal Vardi
Yakir Oz
Yaniv Nikankin
Michal Irani
26
10
0
04 Jul 2023
Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Anvith Thudi
Hengrui Jia
Casey Meehan
Ilia Shumailov
Nicolas Papernot
20
3
0
01 Jul 2023
Memory-Query Tradeoffs for Randomized Convex Optimization
X. Chen
Binghui Peng
34
6
0
21 Jun 2023
Understanding the Effect of the Long Tail on Neural Network Compression
Harvey Dam
Vinu Joseph
Aditya Bhaskara
G. Gopalakrishna
Saurav Muralidharan
M. Garland
21
2
0
09 Jun 2023
Adaptive Conformal Regression with Jackknife+ Rescaled Scores
N. Deutschmann
Mattia Rigotti
María Rodríguez Martínez
21
10
0
31 May 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features
Simone Bombari
Marco Mondelli
AAML
19
4
0
20 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
74
1,142
0
17 May 2023
AI Model Disgorgement: Methods and Choices
Alessandro Achille
Michael Kearns
Carson Klingenberg
Stefano Soatto
MU
20
11
0
07 Apr 2023
On Differential Privacy and Adaptive Data Analysis with Bounded Space
Itai Dinur
Uri Stemmer
David P. Woodruff
Samson Zhou
16
12
0
11 Feb 2023
Understanding Reconstruction Attacks with the Neural Tangent Kernel and Dataset Distillation
Noel Loo
Ramin Hasani
Mathias Lechner
Alexander Amini
Daniela Rus
DD
26
5
0
02 Feb 2023
Pathologies of Predictive Diversity in Deep Ensembles
Taiga Abe
E. Kelly Buchanan
Geoff Pleiss
John P. Cunningham
UQCV
38
13
0
01 Feb 2023
Extracting Training Data from Diffusion Models
Nicholas Carlini
Jamie Hayes
Milad Nasr
Matthew Jagielski
Vikash Sehwag
Florian Tramèr
Borja Balle
Daphne Ippolito
Eric Wallace
DiffM
63
569
0
30 Jan 2023
Context-Aware Differential Privacy for Language Modeling
M. H. Dinh
Ferdinando Fioretto
25
2
0
28 Jan 2023
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
30
3
0
08 Dec 2022
1
2
Next