Data Portraits: Recording Foundation Model Training Data
v1v2 (latest)

Data Portraits: Recording Foundation Model Training Data

Neural Information Processing Systems (NeurIPS), 2023

Papers citing "Data Portraits: Recording Foundation Model Training Data"

25 / 25 papers shown
Title
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection AssumptionsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
229
15
0
24 Oct 2024
What's New in My Data? Novelty Exploration via Contrastive Generation
What's New in My Data? Novelty Exploration via Contrastive GenerationInternational Conference on Learning Representations (ICLR), 2024
114
0
0
18 Oct 2024
Improving governance outcomes through AI documentation: Bridging theory
  and practice
Improving governance outcomes through AI documentation: Bridging theory and practiceInternational Conference on Human Factors in Computing Systems (CHI), 2024
133
5
0
13 Sep 2024
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric KnowledgeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
240
15
0
11 Sep 2024
Oasis: Data Curation and Assessment System for Pretraining of Large
  Language Models
Oasis: Data Curation and Assessment System for Pretraining of Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
121
3
0
21 Nov 2023
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination
  for each Benchmark
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
145
242
0
27 Oct 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Position: Key Claims in LLM Research Have a Long Tail of FootnotesInternational Conference on Machine Learning (ICML), 2023
251
22
0
14 Aug 2023
"According to ...": Prompting Language Models Improves Quoting from
  Pre-Training Data
"According to ...": Prompting Language Models Improves Quoting from Pre-Training DataConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
191
54
0
22 May 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for
  Mitigating Data Contamination by Evaluation Benchmarks
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation BenchmarksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
163
119
0
17 May 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
385
150
0
28 Apr 2023

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.