Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.07931
Cited By
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
11 October 2023
A. Maharana
Prateek Yadav
Mohit Bansal
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning"
34 / 34 papers shown
Title
An Empirical Study of Sample Selection Strategies for Large Language Model Repair
Xuran Li
Jingyi Wang
KELM
84
0
0
23 Oct 2025
Unsupervised Active Learning via Natural Feature Progressive Framework
Yuxi Liu
Catherine Lalman
Yimin Yang
72
0
0
06 Oct 2025
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Nilay Naharas
Dang Nguyen
Nesihan Bulut
M. Bateni
Vahab Mirrokni
Baharan Mirzasoleiman
84
0
0
01 Oct 2025
Vision Function Layer in Multimodal LLMs
Cheng Shi
Yizhou Yu
Sibei Yang
76
0
0
29 Sep 2025
Coresets from Trajectories: Selecting Data via Correlation of Loss Differences
M. Nagaraj
Deepak Ravikumar
Kaushik Roy
139
2
0
27 Aug 2025
Class-Proportional Coreset Selection for Difficulty-Separable Data
Elisa Tsai
Haizhong Zheng
A. Prakash
156
0
0
15 Jul 2025
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
Shivam Chandhok
Qian Yang
Oscar Manas
Kanishk Jain
Leonid Sigal
Aishwarya Agrawal
187
1
0
01 Jun 2025
X-Factor: Quality Is a Dataset-Intrinsic Property
Josiah D. Couch
Miao Li
Rima Arnaout
R. Arnaout
172
2
0
28 May 2025
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung
Seungju Han
Ximing Lu
Skyler Hallinan
David Acuna
Shrimai Prabhumoye
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Yejin Choi
SyDa
333
11
0
26 May 2025
Extending Dataset Pruning to Object Detection: A Variance-based Approach
Ryota Yagi
VLM
212
0
0
22 May 2025
When Dynamic Data Selection Meets Data Augmentation
Steve Yang
Peng Ye
Furao Shen
Dongzhan Zhou
203
4
0
02 May 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
288
2
0
14 Apr 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
232
3
0
18 Mar 2025
MUSS: Multilevel Subset Selection for Relevance and Diversity
Vu Nguyen
Andrey Kan
264
0
0
14 Mar 2025
Finding the Muses: Identifying Coresets through Loss Trajectories
M. Nagaraj
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
238
0
0
12 Mar 2025
Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra
Trisha Mittal
Subhadra Gopalakrishnan
Joshua Kimball
248
0
0
23 Feb 2025
A CLIP-Powered Framework for Robust and Generalizable Data Selection
International Conference on Learning Representations (ICLR), 2024
Steve Yang
Peng Ye
Wanli Ouyang
Dongzhan Zhou
Furao Shen
381
12
0
15 Oct 2024
Adapt-
∞
\infty
∞
: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
International Conference on Learning Representations (ICLR), 2024
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
254
0
0
14 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Computer Vision and Pattern Recognition (CVPR), 2024
Qiuheng Wang
Yukai Shi
Jiarong Ou
Ruoxin Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
357
79
0
10 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
International Conference on Learning Representations (ICLR), 2024
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
324
2
0
03 Oct 2024
Deep Model Interpretation with Limited Data : A Coreset-based Approach
Hamed Behzadi-Khormouji
José Oramas
SLR
239
0
0
01 Oct 2024
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
Neural Information Processing Systems (NeurIPS), 2024
Jiawei Du
Xin Zhang
Juncheng Hu
Wenxin Huang
Joey Tianyi Zhou
DD
323
20
0
26 Sep 2024
Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation
Shaobo Wang
Yantai Yang
Qilong Wang
Kaixin Li
Linfeng Zhang
Junchi Yan
DD
333
11
0
22 Aug 2024
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for Optimizing LLM Training
Yingxuan Yang
Huayi Wang
Muning Wen
Weinan Zhang
188
0
0
10 Aug 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
275
10
0
11 Jul 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
252
20
0
16 Jun 2024
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
Yiping Wang
Yifang Chen
Wendan Yan
Alex Fang
Wenjing Zhou
Kevin Jamieson
S. Du
228
13
0
29 May 2024
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman
Stefan Lee
Scott D. Cohen
Kushal Kafle
VLM
127
8
0
24 Apr 2024
LongWanjuan: Towards Systematic Measurement for Long Text Quality
Kai Lv
Xiaoran Liu
Qipeng Guo
Hang Yan
Conghui He
Xipeng Qiu
Dahua Lin
139
9
0
21 Feb 2024
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
Ruibo Chen
Yihan Wu
Lichang Chen
Guodong Liu
Qi He
Tianyi Xiong
Chenxi Liu
Junfeng Guo
Heng-Chiao Huang
VLM
148
35
0
19 Feb 2024
Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning
Yiping Wang
Yifang Chen
Wendan Yan
Kevin Jamieson
S. Du
200
7
0
03 Feb 2024
Data Management For Large Language Models: A Survey
Zige Wang
Wanjun Zhong
Yufei Wang
Qi Zhu
Fei Mi
Baojun Wang
Lifeng Shang
Xin Jiang
Qun Liu
LM&MA
158
13
0
04 Dec 2023
Data Diversity Matters for Robust Instruction Tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alexander Bukharin
Tuo Zhao
292
66
0
21 Nov 2023
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
1.1K
4,602
0
17 Jun 2020
1