Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.07137
Cited By
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
14 June 2022
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
Winnie Xu
Benedikt Höltgen
Aidan N. Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt"
50 / 112 papers shown
Title
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws
Xiyuan Wei
Ming Lin
Fanjiang Ye
Fengguang Song
Liangliang Cao
My T. Thai
Tianbao Yang
LLMSV
24
0
0
10 May 2025
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang
Matthew Finlayson
Luca Soldaini
Swabha Swayamdipta
Robin Jia
62
0
0
05 May 2025
DONOD: Robust and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning
Jucheng Hu
S. M. I. Simon X. Yang
Dongzhan Zhou
Lijun Wu
29
0
0
21 Apr 2025
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao
Haoran Xu
Amy Zhang
Weinan Zhang
Chenjia Bai
31
0
0
08 Apr 2025
PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity
Mustafa Burak Gurbuz
Xingyu Zheng
C. Dovrolis
OOD
38
0
0
07 Apr 2025
Token Weighting for Long-Range Language Modeling
Falko Helm
Nico Daheim
Iryna Gurevych
60
1
0
12 Mar 2025
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
Toan Tran
Ruixuan Liu
Li Xiong
MU
41
0
0
27 Feb 2025
Mixtraining: A Better Trade-Off Between Compute and Performance
Zexin Li
Jiancheng Zhang
Yufei Li
Yinglun Zhu
Cong Liu
39
0
0
26 Feb 2025
Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives
Dilermando Queiroz
Anderson Carlos
André Anjos
Lilian Berton
43
0
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
99
0
0
21 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
115
3
0
06 Feb 2025
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Steven Feng
Shrimai Prabhumoye
Kezhi Kong
Dan Su
M. Patwary
M. Shoeybi
Bryan Catanzaro
67
2
0
18 Dec 2024
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
26
0
0
16 Nov 2024
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Hsun-Yu Kuo
Yin-Hsiang Liao
Yu-Chieh Chao
Wei-Yun Ma
Pu-Jen Cheng
SyDa
45
2
0
28 Oct 2024
Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits
Adit Jain
Soumyabrata Pal
Sunav Choudhary
Ramasuri Narayanam
Vikram Krishnamurthy
21
1
0
26 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
28
2
0
24 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit
Zhiwei Deng
Tao Li
Yang Li
24
1
0
22 Oct 2024
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
39
15
0
15 Oct 2024
Adapt-
∞
\infty
∞
: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
A. Maharana
Jaehong Yoon
Tianlong Chen
Mohit Bansal
29
0
0
14 Oct 2024
On the Generalization Properties of Deep Learning for Aircraft Fuel Flow Estimation Models
Gabriel Jarry
Ramon Dalmau
Philippe Very
Junzi Sun
AI4TS
18
0
0
10 Oct 2024
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
48
4
0
09 Oct 2024
Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models
Yize Li
Yihua Zhang
Sijia Liu
Xue Lin
42
3
0
27 Sep 2024
Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks
Eeshaan Jain
Tushar Nandy
Gaurav Aggarwal
Ashish Tendulkar
Rishabh K. Iyer
A. De
25
10
0
18 Sep 2024
Are Sparse Neural Networks Better Hard Sample Learners?
Q. Xiao
Boqian Wu
Lu Yin
Christopher Neil Gadzinski
Tianjin Huang
Mykola Pechenizkiy
D. Mocanu
33
1
0
13 Sep 2024
A framework for measuring the training efficiency of a neural architecture
Eduardo Cueto-Mendoza
John D. Kelleher
38
0
0
12 Sep 2024
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna
Chethan Bhateja
Yichen Jian
Karl Pertsch
Dorsa Sadigh
23
13
0
26 Aug 2024
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for Optimizing LLM Training
Yingxuan Yang
Huayi Wang
Muning Wen
Weinan Zhang
41
0
0
10 Aug 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
67
36
1
01 Jul 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Rui Pan
Jipeng Zhang
Xingyuan Pan
Renjie Pi
Xiaoyu Wang
Tong Zhang
45
5
0
28 Jun 2024
Dynamic Data Pruning for Automatic Speech Recognition
Q. Xiao
Pingchuan Ma
Adriana Fernandez-Lopez
Boqian Wu
Lu Yin
Stavros Petridis
Mykola Pechenizkiy
Maja Pantic
D. Mocanu
Shiwei Liu
23
1
0
26 Jun 2024
Data curation via joint example selection further accelerates multimodal learning
Talfan Evans
Nikhil Parthasarathy
Hamza Merzic
Olivier J. Hénaff
32
12
0
25 Jun 2024
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
David Brandfonbrener
Hanlin Zhang
Andreas Kirsch
Jonathan Richard Schwarz
Sham Kakade
26
7
0
15 Jun 2024
Labeled Data Selection for Category Discovery
Bingchen Zhao
Nico Lang
Serge J. Belongie
Oisin Mac Aodha
26
3
0
07 Jun 2024
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
27
4
0
07 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
51
0
0
03 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
30
23
0
30 May 2024
Cascade-Aware Training of Language Models
Congchao Wang
Sean Augenstein
Keith Rush
Wittawat Jitkrittum
Harikrishna Narasimhan
A. S. Rawat
A. Menon
Alec Go
28
4
0
29 May 2024
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
Yiping Wang
Yifang Chen
Wendan Yan
Alex Fang
Wenjing Zhou
Kevin G. Jamieson
S. Du
32
7
0
29 May 2024
Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model
Chen Huang
Yang Deng
Wenqiang Lei
Jiancheng Lv
Ido Dagan
36
4
0
20 May 2024
Nonparametric Teaching of Implicit Neural Representations
Chen Zhang
Steven Tin Sui Luo
Jason Chun Lok Li
Yik-Chung Wu
Ngai Wong
38
2
0
17 May 2024
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Feiyang Kang
H. Just
Yifan Sun
Himanshu Jahagirdar
Yuanzhi Zhang
Rongxing Du
Anit Kumar Sahu
Ruoxi Jia
54
17
0
05 May 2024
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
46
55
0
11 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
37
60
0
25 Mar 2024
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain
A. Suggala
Pradeep Shenoy
OOD
OffRL
27
2
0
18 Mar 2024
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
36
7
0
27 Feb 2024
Efficient Backpropagation with Variance-Controlled Adaptive Sampling
Ziteng Wang
Jianfei Chen
Jun Zhu
BDL
32
2
0
27 Feb 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
68
62
0
26 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
29
8
0
22 Feb 2024
Efficient data selection employing Semantic Similarity-based Graph Structures for model training
Roxana Petcu
Subhadeep Maji
19
0
0
22 Feb 2024
1
2
3
Next