Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.07075
Cited By
Deep Learning on a Data Diet: Finding Important Examples Early in Training
15 July 2021
Mansheej Paul
Surya Ganguli
Gintare Karolina Dziugaite
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Learning on a Data Diet: Finding Important Examples Early in Training"
50 / 70 papers shown
Title
When Dynamic Data Selection Meets Data Augmentation
S. M. I. Simon X. Yang
Peng Ye
F. Shen
Dongzhan Zhou
24
0
0
02 May 2025
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization
Mengyang Li
Zhong Zhang
27
0
0
10 Apr 2025
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
Anish Acharya
Sujay Sanghavi
Alexandros G. Dimakis
Inderjit S Dhillon
AAML
57
0
0
01 Apr 2025
Severing Spurious Correlations with Data Pruning
Varun Mulchandani
Jung-Eun Kim
144
0
0
24 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu-Xi Cheng
VLM
59
1
0
17 Mar 2025
A Large-Scale Study on Video Action Dataset Condensation
Yang Chen
Sheng Guo
Bo Zheng
Limin Wang
DD
77
2
0
13 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
163
0
0
10 Mar 2025
Diversity-Oriented Data Augmentation with Large Language Models
Zaitian Wang
Jinghan Zhang
Xinhao Zhang
Kunpeng Liu
Pengfei Wang
Yuanchun Zhou
78
1
0
17 Feb 2025
LiveVal: Time-aware Data Valuation via Adaptive Reference Points
Jie Xu
Zihan Wu
Cong Wang
Xiaohua Jia
AI4TS
46
0
0
14 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
115
3
0
06 Feb 2025
On Learning Representations for Tabular Data Distillation
Inwon Kang
Parikshit Ram
Yi Zhou
Horst Samulowitz
O. Seneviratne
DD
64
0
0
23 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
59
0
0
20 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
90
12
0
31 Dec 2024
Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification
Zi Yang
Haojin Yang
Soumajit Majumder
Jorge M. Cardoso
Guillermo Gallego
MoMe
VLM
93
1
0
13 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
146
0
0
01 Dec 2024
Unsupervised Replay Strategies for Continual Learning with Limited Data
Anthony Bazhenov
Pahan Dewasurendra
G. Krishnan
Jean Erik Delanois
CLL
24
0
0
21 Oct 2024
Accelerating Deep Learning with Fixed Time Budget
Muhammad Asif Khan
R. Hamila
Hamid Menouar
28
0
0
03 Oct 2024
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
S. Joshi
Jiayi Ni
Baharan Mirzasoleiman
DD
69
2
0
03 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
28
0
0
03 Oct 2024
Targeted synthetic data generation for tabular data via hardness characterization
Tommaso Ferracci
Leonie Goldmann
Anton Hinel
Francesco Sanna Passino
135
0
0
01 Oct 2024
Unsupervised Domain Adaptation Via Data Pruning
Andrea Napoli
Paul White
36
1
0
18 Sep 2024
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review
Neha Prakriya
Jui-Nan Yen
Cho-Jui Hsieh
Jason Cong
KELM
AI4CE
LRM
31
1
0
10 Sep 2024
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Quang H. Nguyen
Nguyen Ngoc-Hieu
The-Anh Ta
Thanh Nguyen-Tang
Kok-Seng Wong
Hoang Thanh-Tung
Khoa D. Doan
AAML
33
2
0
15 Jul 2024
CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning
Huaiguang Cai
FedML
TDI
56
1
0
17 Jun 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
43
8
0
16 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
66
1
0
12 Jun 2024
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
34
4
0
07 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
51
0
0
03 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
35
24
0
30 May 2024
SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
Yongmin Lee
Hye Won Chung
29
6
0
28 May 2024
ATOM: Attention Mixer for Efficient Dataset Distillation
Samir Khaki
A. Sajedi
Kai Wang
Lucy Z. Liu
Y. Lawryshyn
Konstantinos N. Plataniotis
47
3
0
02 May 2024
Is Adversarial Training with Compressed Datasets Effective?
Tong Chen
Raghavendra Selvan
AAML
52
0
0
08 Feb 2024
Generative Deduplication For Socia Media Data Selection
Xianming Li
Jing Li
29
2
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
Revisiting Knowledge Distillation under Distribution Shift
Songming Zhang
Ziyu Lyu
Xiaofeng Chen
29
1
0
25 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
18
16
0
08 Dec 2023
A Negative Result on Gradient Matching for Selective Backprop
Lukas Balles
Cédric Archambeau
Giovanni Zappella
29
0
0
08 Dec 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Joey Tianyi Zhou
37
7
0
22 Nov 2023
DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection
Devleena Das
Vivek Khetan
21
0
0
25 Oct 2023
Generalizing Medical Image Representations via Quaternion Wavelet Networks
Luigi Sigillo
Eleonora Grassucci
A. Uncini
Danilo Comminiello
MedIm
25
5
0
16 Oct 2023
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
19
3
0
15 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
21
28
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
24
1
0
10 Oct 2023
Uncovering Neural Scaling Laws in Molecular Representation Learning
Dingshuo Chen
Yanqiao Zhu
Jieyu Zhang
Yuanqi Du
Zhixun Li
Qiang Liu
Shu Wu
Liang Wang
32
16
0
15 Sep 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
24
56
0
23 Aug 2023
Dataset Distillation Meets Provable Subset Selection
M. Tukan
Alaa Maalouf
Margarita Osadchy
DD
29
4
0
16 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
GIO: Gradient Information Optimization for Training Dataset Selection
Dante Everaert
Christopher Potts
21
3
0
20 Jun 2023
Sample-Level Weighting for Multi-Task Learning with Auxiliary Tasks
Emilie Grégoire
M. H. Chaudhary
Sam Verboven
24
1
0
07 Jun 2023
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-Michel Attendu
Jean-Philippe Corbeil
28
15
0
05 Jun 2023
1
2
Next