ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00622
  4. Cited By
Datamodels: Predicting Predictions from Training Data

Datamodels: Predicting Predictions from Training Data

1 February 2022
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
A. Madry
    TDI
ArXivPDFHTML

Papers citing "Datamodels: Predicting Predictions from Training Data"

50 / 113 papers shown
Title
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
S. Dass
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
Andrew Ilyas
Roberto Martin-Martin
13
0
0
14 May 2025
MAGIC: Near-Optimal Data Attribution for Deep Learning
MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas
Logan Engstrom
TDI
37
0
0
23 Apr 2025
Learning to Attribute with Attention
Learning to Attribute with Attention
Benjamin Cohen-Wang
Yung-Sung Chuang
Aleksander Madry
25
0
0
18 Apr 2025
Representational Similarity via Interpretable Visual Concepts
Representational Similarity via Interpretable Visual Concepts
Neehar Kondapaneni
Oisin Mac Aodha
Pietro Perona
DRL
121
0
0
19 Mar 2025
Finding the Muses: Identifying Coresets through Loss Trajectories
M. Nagaraj
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
36
0
0
12 Mar 2025
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness
Nathan G. Drenkow
Mathias Unberath
OOD
76
0
0
04 Mar 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
126
2
0
21 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
147
0
0
17 Feb 2025
Privacy-Preserving Dataset Combination
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
36
0
0
09 Feb 2025
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning
Jingruo Sun
Zachary Frangella
Madeleine Udell
31
0
0
28 Jan 2025
Most Influential Subset Selection: Challenges, Promises, and Beyond
Most Influential Subset Selection: Challenges, Promises, and Beyond
Yuzheng Hu
Pingbang Hu
Han Zhao
Jiaqi W. Ma
TDI
136
2
0
10 Jan 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James Validad Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
45
8
0
08 Jan 2025
Towards Data Governance of Frontier AI Models
Towards Data Governance of Frontier AI Models
Jason Hausenloy
Duncan McClements
Madhavendra Thakur
67
1
0
05 Dec 2024
A Versatile Influence Function for Data Attribution with
  Non-Decomposable Loss
A Versatile Influence Function for Data Attribution with Non-Decomposable Loss
Junwei Deng
Weijing Tang
Jiaqi W. Ma
TDI
124
0
0
02 Dec 2024
TAROT: Targeted Data Selection via Optimal Transport
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
125
1
0
30 Nov 2024
Delta-Influence: Unlearning Poisons via Influence Functions
Delta-Influence: Unlearning Poisons via Influence Functions
Wenjie Li
Jiawei Li
Christian Schroeder de Witt
Ameya Prabhu
Amartya Sanyal
TDI
MU
92
0
0
20 Nov 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
77
3
0
19 Nov 2024
One Sample Fits All: Approximating All Probabilistic Values
  Simultaneously and Efficiently
One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently
Weida Li
Yaoliang Yu
37
1
0
31 Oct 2024
Attribute-to-Delete: Machine Unlearning via Datamodel Matching
Attribute-to-Delete: Machine Unlearning via Datamodel Matching
Kristian Georgiev
Roy Rinberg
Sung Min Park
Shivam Garg
Andrew Ilyas
Aleksander Madry
Seth Neel
MU
38
3
0
30 Oct 2024
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
Jinxu Lin
Linwei Tao
Minjing Dong
Chang Xu
TDI
36
2
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
33
1
0
22 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit
Influential Language Data Selection via Gradient Trajectory Pursuit
Zhiwei Deng
Tao Li
Yang Li
24
1
0
22 Oct 2024
Active Fourier Auditor for Estimating Distributional Properties of ML
  Models
Active Fourier Auditor for Estimating Distributional Properties of ML Models
Ayoub Ajarra
Bishwamittra Ghosh
Debabrota Basu
MLAU
44
0
0
10 Oct 2024
$\texttt{dattri}$: A Library for Efficient Data Attribution
dattri\texttt{dattri}dattri: A Library for Efficient Data Attribution
Junwei Deng
Ting-Wei Li
Shiyuan Zhang
Shixuan Liu
Yijun Pan
Hao Huang
Xinhe Wang
Pingbang Hu
Xingjian Zhang
Jiaqi W. Ma
TDI
21
3
0
06 Oct 2024
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
43
1
0
04 Oct 2024
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
David Grangier
Simin Fan
Skyler Seto
Pierre Ablin
36
3
0
30 Sep 2024
Towards User-Focused Research in Training Data Attribution for
  Human-Centered Explainable AI
Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI
Elisa Nguyen
Johannes Bertram
Evgenii Kortukov
Jean Y. Song
Seong Joon Oh
TDI
367
2
0
25 Sep 2024
Scalable Multitask Learning Using Gradient-based Estimation of Task
  Affinity
Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity
Dongyue Li
Aneesh Sharma
Hongyang R. Zhang
67
1
0
09 Sep 2024
Improving Pretraining Data Using Perplexity Correlations
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush
Christopher Potts
Tatsunori Hashimoto
32
17
0
09 Sep 2024
ContextCite: Attributing Model Generation to Context
ContextCite: Attributing Model Generation to Context
Benjamin Cohen-Wang
Harshay Shah
Kristian Georgiev
Aleksander Madry
LRM
30
18
0
01 Sep 2024
Enhancing High-Energy Particle Physics Collision Analysis through Graph
  Data Attribution Techniques
Enhancing High-Energy Particle Physics Collision Analysis through Graph Data Attribution Techniques
A. Verdone
A. Devoto
C. Sebastiani
J. Carmignani
M. D’Onofrio
S. Giagu
S. Scardapane
M. Panella
35
0
0
20 Jul 2024
Operationalizing the Blueprint for an AI Bill of Rights: Recommendations
  for Practitioners, Researchers, and Policy Makers
Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers
Alex Oesterling
Usha Bhalla
Suresh Venkatasubramanian
Himabindu Lakkaraju
36
1
0
11 Jul 2024
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via
  Data Selection
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Saachi Jain
Kimia Hamidieh
Kristian Georgiev
Andrew Ilyas
Marzyeh Ghassemi
Aleksander Madry
35
2
0
24 Jun 2024
LayerMatch: Do Pseudo-labels Benefit All Layers?
LayerMatch: Do Pseudo-labels Benefit All Layers?
Chaoqi Liang
Guanglei Yang
Lifeng Qiao
Zitong Huang
Hongliang Yan
Yunchao Wei
W. Zuo
36
0
0
20 Jun 2024
Large-Scale Dataset Pruning in Adversarial Training through Data
  Importance Extrapolation
Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation
Bjorn Nieth
Thomas Altstidl
Leo Schwinn
Björn Eskofier
AAML
32
2
0
19 Jun 2024
Data Shapley in One Training Run
Data Shapley in One Training Run
Jiachen T. Wang
Prateek Mittal
Dawn Song
Ruoxi Jia
TDI
27
7
0
16 Jun 2024
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language
  Model Pre-training
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
David Brandfonbrener
Hanlin Zhang
Andreas Kirsch
Jonathan Richard Schwarz
Sham Kakade
26
7
0
15 Jun 2024
MATES: Model-Aware Data Selection for Efficient Pretraining with Data
  Influence Models
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
Zichun Yu
Spandan Das
Chenyan Xiong
34
24
0
10 Jun 2024
Zyda: A 1.3T Dataset for Open Language Modeling
Zyda: A 1.3T Dataset for Open Language Modeling
Yury Tokpanov
Beren Millidge
Paolo Glorioso
Jonathan Pilault
Adam Ibrahim
James Whittington
Quentin Anthony
35
2
0
04 Jun 2024
Scaling Laws for the Value of Individual Data Points in Machine Learning
Scaling Laws for the Value of Individual Data Points in Machine Learning
Ian Covert
Wenlong Ji
Tatsunori Hashimoto
James Y. Zou
TDI
37
8
0
30 May 2024
Efficient Ensembles Improve Training Data Attribution
Efficient Ensembles Improve Training Data Attribution
Junwei Deng
Ting-Wei Li
Shichang Zhang
Jiaqi Ma
TDI
25
2
0
27 May 2024
Training Data Attribution via Approximate Unrolled Differentiation
Training Data Attribution via Approximate Unrolled Differentiation
Juhan Bae
Wu Lin
Jonathan Lorraine
Roger C. Grosse
TDI
MU
49
12
0
20 May 2024
LMD3: Language Model Data Density Dependence
LMD3: Language Model Data Density Dependence
John Kirchenbauer
Garrett Honke
Gowthami Somepalli
Jonas Geiping
Daphne Ippolito
Katherine Lee
Tom Goldstein
David Andre
35
6
0
10 May 2024
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits
Jiachen T. Wang
Tianji Yang
James Y. Zou
Yongchan Kwon
Ruoxi Jia
TDI
31
9
0
06 May 2024
Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
Anshuman Chhabra
Bo Li
Jian Chen
Prasant Mohapatra
Hongfu Liu
TDI
18
0
0
06 May 2024
Multigroup Robustness
Multigroup Robustness
Lunjia Hu
Charlotte Peale
Judy Hanwen Shen
OOD
21
1
0
01 May 2024
Distilled Datamodel with Reverse Gradient Matching
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye
Ruonan Yu
Songhua Liu
Xinchao Wang
DD
41
3
0
22 Apr 2024
Incremental Residual Concept Bottleneck Models
Incremental Residual Concept Bottleneck Models
Chenming Shang
Shiji Zhou
Hengyuan Zhang
Xinzhe Ni
Yujiu Yang
Yuwang Wang
34
14
0
13 Apr 2024
94% on CIFAR-10 in 3.29 Seconds on a Single GPU
94% on CIFAR-10 in 3.29 Seconds on a Single GPU
Keller Jordan
VLM
16
5
0
30 Mar 2024
Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to
  Inform GenAI Copyright Disputes
Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes
Uri Y. Hacohen
Adi Haviv
Shahar Sarfaty
Bruria Friedman
N. Elkin-Koren
Roi Livni
Amit H. Bermano
AILaw
34
7
0
26 Mar 2024
123
Next