ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04333
  4. Cited By
LESS: Selecting Influential Data for Targeted Instruction Tuning
v1v2v3 (latest)

LESS: Selecting Influential Data for Targeted Instruction Tuning

6 February 2024
Mengzhou Xia
Sadhika Malladi
Suchin Gururangan
Sanjeev Arora
Danqi Chen
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (1339★)

Papers citing "LESS: Selecting Influential Data for Targeted Instruction Tuning"

50 / 244 papers shown
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
363
5
0
24 Dec 2025
When unlearning is free: leveraging low influence points to reduce computational costs
When unlearning is free: leveraging low influence points to reduce computational costs
Anat Kleiman
Robert Fisher
Ben Deaner
Udi Wieder
MU
341
0
0
04 Dec 2025
Mode-Conditioning Unlocks Superior Test-Time Scaling
Mode-Conditioning Unlocks Superior Test-Time Scaling
Chen Henry Wu
Sachin Goyal
Aditi Raghunathan
VLM
215
4
0
30 Nov 2025
Bandit Guided Submodular Curriculum for Adaptive Subset Selection
Bandit Guided Submodular Curriculum for Adaptive Subset Selection
Prateek Chanda
Prayas Agrawal
Saral Sureka
Lokesh Reddy Polu
Atharv Kshirsagar
Ganesh Ramakrishnan
308
0
0
28 Nov 2025
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
Yi Zhang
Che Liu
Xiancong Ren
Hanchu Ni
Yingji Zhang
...
Zenglin Xu
Bin Shen
Qifan Wang
Jian Tang
Xiaozhu Ju
VLM
221
2
0
20 Nov 2025
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure
Ke Jia
Yuheng Ma
Yang Li
Feifei Wang
178
4
0
11 Nov 2025
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
P. Wang
Hongcheng Liu
Yusheng Liao
Ziqing Fan
Yaxin Du
Shuo Tang
Y. Wang
Y Samuel Wang
170
2
0
10 Nov 2025
Sampling and Loss Weights in Multi-Domain Training
Sampling and Loss Weights in Multi-Domain Training
Mahdi Salmani
Pratik Worah
Meisam Razaviyayn
Vahab Mirrokni
NoLa
358
0
0
10 Nov 2025
In Good GRACEs: Principled Teacher Selection for Knowledge Distillation
In Good GRACEs: Principled Teacher Selection for Knowledge DistillationIEEE computer architecture letters (CAL), 2025
A. Panigrahi
Bingbin Liu
Sadhika Malladi
Sham Kakade
Surbhi Goel
289
3
0
04 Nov 2025
Geometric Data Valuation via Leverage Scores
Geometric Data Valuation via Leverage Scores
Rodrigo Mendoza-Smith
TDIFedML
400
0
0
03 Nov 2025
LLM generation novelty through the lens of semantic similarity
LLM generation novelty through the lens of semantic similarity
Philipp Davydov
Ameya Prabhu
Matthias Bethge
Elisa Nguyen
Seong Joon Oh
TDI
509
0
1
31 Oct 2025
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu
Li Shen
Zhenyi Wang
Yongxian Wei
Dacheng Tao
AAML
202
7
0
31 Oct 2025
Data-Efficient RLVR via Off-Policy Influence Guidance
Data-Efficient RLVR via Off-Policy Influence Guidance
Erle Zhu
Dazhi Jiang
Y. Wang
X. Li
Jiale Cheng
...
Yilin Niu
A. Zeng
J. Tang
Shiyu Huang
Hongning Wang
OffRL
213
3
0
30 Oct 2025
Accumulative SGD Influence Estimation for Data Attribution
Accumulative SGD Influence Estimation for Data Attribution
Yunxiao Shi
Shuo Yang
Yixin Su
Rui-Xun Zhang
Min Xu
TDI
326
0
0
30 Oct 2025
A Survey on Efficient Large Language Model Training: From Data-centric Perspectives
A Survey on Efficient Large Language Model Training: From Data-centric PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Junyu Luo
Bohan Wu
Xiao Luo
Zhiping Xiao
Yiqiao Jin
...
Nan Yin
Yifan Wang
Jingyang Yuan
Wei Ju
Ming Zhang
201
9
0
29 Oct 2025
LimRank: Less is More for Reasoning-Intensive Information Reranking
LimRank: Less is More for Reasoning-Intensive Information Reranking
Tingyu Song
Yilun Zhao
Siyue Zhang
Chen Zhao
Arman Cohan
RALMALMLRM
428
1
0
27 Oct 2025
An Empirical Study of Sample Selection Strategies for Large Language Model Repair
An Empirical Study of Sample Selection Strategies for Large Language Model Repair
Xuran Li
Jingyi Wang
KELM
171
0
0
23 Oct 2025
LM-mixup: Text Data Augmentation via Language Model based Mixup
LM-mixup: Text Data Augmentation via Language Model based Mixup
Zhijie Deng
Zhouan Shen
Ling Li
Yao Zhou
Zhaowei Zhu
Yanji He
Wei Wang
Jiaheng Wei
148
0
0
23 Oct 2025
AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation
AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation
Xianyang Liu
Y. Liu
Shuai Wang
Hao Cheng
Andrew Estornell
Yuzhi Zhao
Jiaheng Wei
Jiaheng Wei
LRM
264
4
0
22 Oct 2025
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou
Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji
259
3
0
19 Oct 2025
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rahul Nadkarni
Yanai Elazar
Hila Gonen
Noah A. Smith
KELM
186
0
0
16 Oct 2025
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
Ling Zhang
Xianliang Yang
Juwon Yu
Park Cheonyoung
Lei Song
Jiang Bian
Jiang Bian
162
0
0
16 Oct 2025
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Zizhuo Zhang
Qizhou Wang
Shanshan Ye
Jianing Zhu
Jiangchao Yao
Bo Han
Masashi Sugiyama
TDIALM
167
5
0
15 Oct 2025
On the Role of Preference Variance in Preference Optimization
On the Role of Preference Variance in Preference Optimization
Jiacheng Guo
Zihao Li
Jiahao Qiu
Yue Wu
Mengdi Wang
211
3
0
14 Oct 2025
Z0-Inf: Zeroth Order Approximation for Data Influence
Z0-Inf: Zeroth Order Approximation for Data Influence
Narine Kokhlikyan
Kamalika Chaudhuri
Saeed Mahloujifar
TDI
225
0
0
13 Oct 2025
MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
Bo Cheng
Xu Wang
Jinda Liu
Yi-Ju Chang
Yuan Wu
MoEALM
215
1
0
13 Oct 2025
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Subhodip Panda
Dhruv Tarsadiya
S. Sourav
Prathosh A.P.
Sai Praneeth Karimireddy
TDI
277
0
0
12 Oct 2025
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Shaobo Wang
C. Wang
Wenjie Fu
Yue Min
Mingquan Feng
...
Kexin Yang
Xingzhang Ren
Fei Huang
Dayiheng Liu
Linfeng Zhang
183
0
0
12 Oct 2025
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
Yichen Yan
Ming Zhong
Qi Zhu
Xiaoling Gu
Jinpeng Chen
Huan Li
170
3
0
11 Oct 2025
Skill-Targeted Adaptive Training
Skill-Targeted Adaptive Training
Yinghui He
A. Panigrahi
Yong Lin
Sanjeev Arora
LRM
170
2
0
11 Oct 2025
How Reliable is Language Model Micro-Benchmarking?
How Reliable is Language Model Micro-Benchmarking?
Gregory Yauney
Shahzaib Saqib Warraich
Swabha Swayamdipta
ALM
278
1
0
09 Oct 2025
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
Jie Hao
Rui Yu
W. Zhang
Huixia Wang
Jie Xu
Mingrui Liu
362
0
0
07 Oct 2025
The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning
The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning
Alessandro Favero
PINNGNN
294
4
0
07 Oct 2025
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Ziyan Wang
Zheng Wang
Jie Fu
Xingwei Qu
Qi Cheng
Shengpu Tang
Minjia Zhang
Xiaoming Huo
LRM
290
2
0
05 Oct 2025
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
Xinhao Yao
Lu Yu
Xiaolin Hu
Fengwei Teng
Qing Cui
Jun Zhou
Yong Liu
LRM
313
6
0
05 Oct 2025
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Nilay Naharas
Dang Nguyen
Nesihan Bulut
M. Bateni
Vahab Mirrokni
Baharan Mirzasoleiman
135
2
0
01 Oct 2025
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Ayush Jain
Andrea Montanari
Eren Sasoglu
323
2
0
01 Oct 2025
Prompt Curriculum Learning for Efficient LLM Post-Training
Prompt Curriculum Learning for Efficient LLM Post-Training
Zhaolin Gao
Joongwon Kim
Wen Sun
Thorsten Joachims
Sid Wang
Richard Yuanzhe Pang
Liang Tan
205
15
0
01 Oct 2025
RL-Guided Data Selection for Language Model Finetuning
RL-Guided Data Selection for Language Model Finetuning
Animesh Jha
Harshit Gupta
Ananjan Nandi
OffRL
312
0
0
30 Sep 2025
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Yang Tang
Ruijie Liu
Yifan Wang
Shiyu Li
Xi Chen
175
0
0
30 Sep 2025
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Shane Bergsma
Nolan Dey
Joel Hestness
235
0
0
29 Sep 2025
Lightweight and Robust Federated Data Valuation
Lightweight and Robust Federated Data Valuation
Guojun Tang
Jiayu Zhou
Mohammad Mamun
Steve Drew
TDIFedML
273
0
0
29 Sep 2025
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Shaobo Wang
Jiaming Wang
Jiajun Zhang
C. Wang
Yue Min
...
Huiqiang Jiang
Junyang Lin
Dayiheng Liu
Linfeng Zhang
Linfeng Zhang
212
6
0
28 Sep 2025
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Junming Yang
Ning Xu
Biao Liu
Shiqi Qiao
Xin Geng
158
2
0
27 Sep 2025
Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models
Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models
Junjie Li
Ziao Wang
Jianghong Ma
Xiaofeng Zhang
240
0
0
27 Sep 2025
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Zhe Li
Wei Zhao
Y. Li
Jun Sun
189
1
0
26 Sep 2025
Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data
Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data
Jiancheng Zhang
Yinglun Zhu
241
1
0
25 Sep 2025
TsqLoRA: Towards Sensitivity and Quality Low-Rank Adaptation for Efficient Fine-Tuning
TsqLoRA: Towards Sensitivity and Quality Low-Rank Adaptation for Efficient Fine-Tuning
Yu Chen
Yifei Han
Long Zhang
Yue Du
Bin Li
197
0
0
23 Sep 2025
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Mohammad Beigi
Ying Shen
Parshin Shojaee
Qifan Wang
Zichao Wang
Chandan K. Reddy
Ming Jin
Lifu Huang
LRM
146
4
0
20 Sep 2025
Toward Efficient Influence Function: Dropout as a Compression Tool
Toward Efficient Influence Function: Dropout as a Compression Tool
Yuchen Zhang
Mohammad Mohammadi Amiri
TDI
295
0
0
19 Sep 2025
12345
Next
Page 1 of 5