Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2110.02782
Cited By
v1
v2 (latest)
How BPE Affects Memorization in Transformers
6 October 2021
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"How BPE Affects Memorization in Transformers"
26 / 26 papers shown
Title
Canonical Autoregressive Generation
Ivi Chatzi
N. C. Benz
Stratis Tsirtsis
Manuel Gomez Rodriguez
131
1
0
06 Jun 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
298
6
0
08 Mar 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
International Conference on Learning Representations (ICLR), 2025
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
201
7
0
21 Jan 2025
On the Privacy Risk of In-context Learning
Haonan Duan
Adam Dziedzic
Mohammad Yaghini
Nicolas Papernot
Franziska Boenisch
SILM
PILM
264
54
0
15 Nov 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
520
22
0
03 Oct 2024
Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications
Till Speicher
Mohammad Aflah Khan
Qinyuan Wu
Vedant Nanda
Soumi Das
Bishwamittra Ghosh
Krishna P. Gummadi
Evimaria Terzi
211
7
0
27 Jul 2024
Bag of Lies: Robustness in Continuous Pre-training BERT
I. Gevers
Walter Daelemans
202
1
0
14 Jun 2024
Memorization in deep learning: A survey
Jiaheng Wei
Yanjun Zhang
Leo Yu Zhang
Ming Ding
Chao Chen
Kok-Leong Ong
Jun Zhang
Yang Xiang
265
15
0
06 Jun 2024
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments
Anton Schäfer
Haiqin Yang
Thomas Hofmann
Tiago Pimentel
Imanol Schlag
345
4
0
11 Apr 2024
On the Effect of (Near) Duplicate Subwords in Language Modelling
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Anton Schäfer
Thomas Hofmann
Imanol Schlag
Tiago Pimentel
229
4
0
09 Apr 2024
DP-TabICL: In-Context Learning with Differentially Private Tabular Data
BigData Congress [Services Society] (BSS), 2024
Alycia N. Carey
Karuna Bhaila
Kennedy Edemacu
Xintao Wu
283
9
0
08 Mar 2024
ROME: Memorization Insights from Text, Logits and Representation
Bo Li
Qing Xia Zhao
Lijie Wen
211
5
0
01 Mar 2024
Conversation Reconstruction Attack Against GPT Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junjie Chu
Zeyang Sha
Michael Backes
Yang Zhang
SILM
87
1
0
05 Feb 2024
Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation
Verna Dankers
Ivan Titov
Dieuwke Hupkes
206
5
0
09 Nov 2023
MoPe: Model Perturbation-based Privacy Attacks on Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Marvin Li
Jason Wang
Jeffrey G. Wang
Seth Neel
AAML
231
26
0
22 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tomer Wullach
Shlomo E. Chazan
172
0
0
16 Oct 2023
Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Neural Information Processing Systems (NeurIPS), 2023
Haonan Duan
Adam Dziedzic
Nicolas Papernot
Franziska Boenisch
AAML
237
87
0
24 May 2023
Recognition, recall, and retention of few-shot memories in large language models
A. Orhan
LRM
KELM
CLL
140
3
0
30 Mar 2023
Language Model Behavior: A Comprehensive Survey
International Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
324
136
0
20 Mar 2023
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhijun Wang
Xuebo Liu
Min Zhang
330
11
0
23 Nov 2022
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Vikas Raunak
Arul Menezes
151
14
0
24 Oct 2022
Recitation-Augmented Language Models
International Conference on Learning Representations (ICLR), 2022
Zhiqing Sun
Xuezhi Wang
Yi Tay
Yiming Yang
Denny Zhou
RALM
629
76
0
04 Oct 2022
A Mixture-of-Expert Approach to RL-based Dialogue Management
International Conference on Learning Representations (ICLR), 2022
Yinlam Chow
Azamat Tulepbergenov
Ofir Nachum
Moonkyung Ryu
Mohammad Ghavamzadeh
Craig Boutilier
MoE
254
16
0
31 May 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
307
237
0
22 May 2022
Do Language Models Plagiarize?
The Web Conference (WWW), 2022
Jooyoung Lee
Thai Le
Jinghui Chen
Dongwon Lee
314
95
0
15 Mar 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
291
193
0
20 Dec 2021
1