ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00622
  4. Cited By
Datamodels: Predicting Predictions from Training Data

Datamodels: Predicting Predictions from Training Data

1 February 2022
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
Aleksander Madry
    TDI
ArXiv (abs)PDFHTMLGithub (97★)

Papers citing "Datamodels: Predicting Predictions from Training Data"

50 / 136 papers shown
Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning
Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning
Dongyue Li
Zhenshuo Zhang
Minxuan Duan
Edgar Dobriban
Hongyang R. Zhang
89
0
0
30 Nov 2025
AssayMatch: Learning to Select Data for Molecular Activity Models
Vincent Fan
Regina Barzilay
95
1
0
20 Nov 2025
Rethinking Data Value: Asymmetric Data Shapley for Structure-Aware Valuation in Data Markets and Machine Learning Pipelines
Rethinking Data Value: Asymmetric Data Shapley for Structure-Aware Valuation in Data Markets and Machine Learning Pipelines
Xi Zheng
Yinghui Huang
Xiangyu Chang
Ruoxi Jia
Yong Tan
100
0
0
17 Nov 2025
Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation
Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation
Zhenshuo Zhang
Minxuan Duan
Youran Ye
Hongyang R. Zhang
OffRL
415
1
0
16 Nov 2025
Error Estimate and Convergence Analysis for Data Valuation
Error Estimate and Convergence Analysis for Data Valuation
Zhangyong Liang
Huanhuan Gao
Ji Zhang
73
0
0
09 Nov 2025
Nonparametric Data Attribution for Diffusion Models
Nonparametric Data Attribution for Diffusion Models
Yutian Zhao
C. Du
Xiaosen Zheng
Tianyu Pang
Min Lin
TDIDiffM
225
0
0
16 Oct 2025
What Is The Performance Ceiling of My Classifier? Utilizing Category-Wise Influence Functions for Pareto Frontier Analysis
What Is The Performance Ceiling of My Classifier? Utilizing Category-Wise Influence Functions for Pareto Frontier Analysis
Shahriar Kabir Nahin
Wenxiao Xiao
Joshua Liu
Anshuman Chhabra
Hongfu Liu
TDI
222
0
0
04 Oct 2025
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Ayush Jain
Andrea Montanari
Eren Sasoglu
184
1
0
01 Oct 2025
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Sebastian Bordt
Martin Pawelczyk
CLL
186
1
0
27 Sep 2025
Exploring Training Data Attribution under Limited Access Constraints
Exploring Training Data Attribution under Limited Access Constraints
Shiyuan Zhang
Junwei Deng
Juhan Bae
Jiaqi W. Ma
TDI
270
0
0
16 Sep 2025
Coresets from Trajectories: Selecting Data via Correlation of Loss Differences
Coresets from Trajectories: Selecting Data via Correlation of Loss Differences
M. Nagaraj
Deepak Ravikumar
Kaushik Roy
233
2
0
27 Aug 2025
Understanding Data Influence with Differential Approximation
Understanding Data Influence with Differential Approximation
Haoru Tan
Sitong Wu
Xiuzhe Wu
Wang Wang
Bo Zhao
Zeke Xie
Gui-Song Xia
Xiaojuan Qi
TDI
273
1
0
20 Aug 2025
Efficiently Verifiable Proofs of Data Attribution
Efficiently Verifiable Proofs of Data Attribution
Ari Karchmer
Seth Neel
Martin Pawelczyk
TDI
346
1
0
14 Aug 2025
Integrated Influence: Data Attribution with Baseline
Integrated Influence: Data Attribution with Baseline
Linxiao Yang
Xinyu Gu
Liang Sun
TDI
189
0
0
07 Aug 2025
WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification
WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification
Thang Duc Tran
Thai Hoang Le
MU
129
0
0
06 Aug 2025
COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning
COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning
Sateesh Kumar
Shivin Dass
Georgios Pavlakos
Roberto Martín-Martín
143
1
0
02 Aug 2025
SourceSplice: Source Selection for Machine Learning Tasks
SourceSplice: Source Selection for Machine Learning Tasks
Ambarish Singh
Romila Pradhan
121
0
0
29 Jul 2025
Better Training Data Attribution via Better Inverse Hessian-Vector Products
Better Training Data Attribution via Better Inverse Hessian-Vector Products
Andrew Wang
Elisa Nguyen
Runshi Yang
Juhan Bae
Sheila A. McIlraith
Roger C. Grosse
TDI
330
2
0
19 Jul 2025
Effective Data Pruning through Score Extrapolation
Sebastian Schmidt
Prasanga Dhungel
Christoffer Löffler
Bjorn Nieth
Stephan Günnemann
Leo Schwinn
SyDa
324
2
0
10 Jun 2025
Learning to Weight Parameters for Training Data Attribution
Learning to Weight Parameters for Training Data Attribution
Shuangqi Li
Hieu M. Le
Aoxiang Fan
Mathieu Salzmann
TDIDiffM
388
1
0
06 Jun 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
394
14
0
30 May 2025
Daunce: Data Attribution through Uncertainty Estimation
Daunce: Data Attribution through Uncertainty Estimation
Xingyuan Pan
Chenlu Ye
Joseph Melkonian
Jiaqi W. Ma
Tong Zhang
TDIUQCV
172
1
0
29 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
412
4
0
27 May 2025
Enhancing Training Data Attribution with Representational Optimization
Enhancing Training Data Attribution with Representational Optimization
W. Sun
Haokun Liu
Nikhil Kandpal
Colin Raffel
Yiming Yang
TDI
466
0
0
24 May 2025
Small-to-Large Generalization: Data Influences Models Consistently Across Scale
Small-to-Large Generalization: Data Influences Models Consistently Across Scale
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
TDIAI4CE
281
1
0
22 May 2025
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Chenlin Ming
Chendi Qu
Mengzhang Cai
Qizhi Pei
Zhuoshi Pan
Yu Li
Xiaoming Duan
Lijun Wu
Bin Wang
208
3
0
19 May 2025
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Shivin Dass
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
Andrew Ilyas
Roberto Martín-Martín
350
8
0
14 May 2025
MAGIC: Near-Optimal Data Attribution for Deep Learning
MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas
Logan Engstrom
TDI
360
5
0
23 Apr 2025
Learning to Attribute with Attention
Learning to Attribute with Attention
Benjamin Cohen-Wang
Yung-Sung Chuang
Aleksander Madry
312
5
0
18 Apr 2025
Representational Similarity via Interpretable Visual Concepts
Representational Similarity via Interpretable Visual ConceptsInternational Conference on Learning Representations (ICLR), 2025
Neehar Kondapaneni
Oisin Mac Aodha
Pietro Perona
DRL
984
3
0
19 Mar 2025
Finding the Muses: Identifying Coresets through Loss Trajectories
M. Nagaraj
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
315
0
0
12 Mar 2025
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
Nathan G. Drenkow
Mathias Unberath
OOD
341
0
0
04 Mar 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Data Attribution for Text-to-Image Models by Unlearning Synthesized ImagesNeural Information Processing Systems (NeurIPS), 2024
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
461
16
0
21 Feb 2025
Privacy-Preserving Dataset Combination
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
357
0
0
09 Feb 2025
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning
Jingruo Sun
Zachary Frangella
Madeleine Udell
218
2
0
28 Jan 2025
Most Influential Subset Selection: Challenges, Promises, and Beyond
Most Influential Subset Selection: Challenges, Promises, and BeyondNeural Information Processing Systems (NeurIPS), 2024
Yuzheng Hu
Pingbang Hu
Han Zhao
Jiaqi W. Ma
TDI
488
21
0
10 Jan 2025
Towards Data Governance of Frontier AI Models
Towards Data Governance of Frontier AI Models
Jason Hausenloy
Duncan McClements
Madhavendra Thakur
454
2
0
05 Dec 2024
A Versatile Influence Function for Data Attribution with
  Non-Decomposable Loss
A Versatile Influence Function for Data Attribution with Non-Decomposable Loss
Junwei Deng
Weijing Tang
Jiaqi W. Ma
TDI
305
0
0
02 Dec 2024
TAROT: Targeted Data Selection via Optimal Transport
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
557
2
0
30 Nov 2024
Delta-Influence: Unlearning Poisons via Influence Functions
Delta-Influence: Unlearning Poisons via Influence Functions
Wenjie Li
Jiawei Li
Christian Schroeder de Witt
Christian Schroeder de Witt
Amartya Sanyal
Amartya Sanyal
MUTDI
429
9
0
20 Nov 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
292
12
0
19 Nov 2024
One Sample Fits All: Approximating All Probabilistic Values
  Simultaneously and Efficiently
One Sample Fits All: Approximating All Probabilistic Values Simultaneously and EfficientlyNeural Information Processing Systems (NeurIPS), 2024
Weida Li
Yaoliang Yu
220
6
0
31 Oct 2024
Attribute-to-Delete: Machine Unlearning via Datamodel Matching
Attribute-to-Delete: Machine Unlearning via Datamodel Matching
Kristian Georgiev
Roy Rinberg
Sung Min Park
Shivam Garg
Andrew Ilyas
Aleksander Madry
Seth Neel
MU
267
10
0
30 Oct 2024
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2024
Jinxu Lin
Linwei Tao
Minjing Dong
Chang Xu
TDI
435
11
0
24 Oct 2024
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Lester James V. Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
435
20
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scalable Influence and Fact Tracing for Large Language Model PretrainingInternational Conference on Learning Representations (ICLR), 2024
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
307
16
0
22 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit
Influential Language Data Selection via Gradient Trajectory Pursuit
Zhiwei Deng
Tao Li
Yang Li
213
1
0
22 Oct 2024
Active Fourier Auditor for Estimating Distributional Properties of ML
  Models
Active Fourier Auditor for Estimating Distributional Properties of ML ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Ayoub Ajarra
Bishwamittra Ghosh
Debabrota Basu
MLAU
353
4
0
10 Oct 2024
$\texttt{dattri}$: A Library for Efficient Data Attribution
dattri\texttt{dattri}dattri: A Library for Efficient Data AttributionNeural Information Processing Systems (NeurIPS), 2024
Junwei Deng
Ting-Wei Li
Shiyuan Zhang
Shixuan Liu
Yijun Pan
Hao Huang
Xinhe Wang
Pingbang Hu
Xingjian Zhang
Jiaqi W. Ma
TDI
170
13
0
06 Oct 2024
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
452
10
0
04 Oct 2024
123
Next