Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15523
Cited By
Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language Models
24 May 2024
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language Models"
7 / 7 papers shown
Title
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
85
3
2
10 Oct 2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
39
33
0
01 Feb 2024
WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data
Jingtan Wang
Xinyang Lu
Zitong Zhao
Zhongxiang Dai
Chuan-Sheng Foo
See-Kiong Ng
K. H. Low
WaLM
50
9
0
01 Oct 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
79
74
0
22 May 2023
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1