Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.13954
Cited By
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
22 May 2024
Sang Keun Choe
Hwijeen Ahn
Juhan Bae
Kewen Zhao
Minsoo Kang
Youngseog Chung
Adithya Pratapa
W. Neiswanger
Emma Strubell
Teruko Mitamura
Jeff Schneider
Eduard Hovy
Roger C. Grosse
Eric P. Xing
TDI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions"
9 / 9 papers shown
Title
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
Yanzhou Pan
Huawei Lin
Yide Ran
Jiamin Chen
Xiaodong Yu
Weijie Zhao
Denghui Zhang
Zhaozhuo Xu
40
0
0
02 Mar 2025
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li
Zichun Yu
Chenyan Xiong
SyDa
27
1
0
18 Oct 2024
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
43
1
0
04 Oct 2024
Fast Training Dataset Attribution via In-Context Learning
Milad Fotouhi
M. T. Bahadori
Oluwaseyi Feyisetan
P. Arabshahi
David Heckerman
31
0
0
14 Aug 2024
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia
Sadhika Malladi
Suchin Gururangan
Sanjeev Arora
Danqi Chen
80
185
0
06 Feb 2024
Making Scalable Meta Learning Practical
Sang Keun Choe
Sanket Vaibhav Mehta
Hwijeen Ahn
W. Neiswanger
Pengtao Xie
Emma Strubell
Eric P. Xing
41
14
0
09 Oct 2023
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning
Jiachen T. Wang
R. Jia
FedML
TDI
50
94
0
30 May 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,986
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1