Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.16986
Cited By
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
25 September 2024
Chi Zhang
Huaping Zhong
Kuan Zhang
Chengliang Chai
Rui Wang
Xinlin Zhuang
Tianyi Bai
Jiantao Qiu
Lei Cao
Ju Fan
Ye Yuan
Guoren Wang
Conghui He
TDI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Harnessing Diversity for Important Data Selection in Pretraining Large Language Models"
4 / 4 papers shown
Title
LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning
Neha Prakriya
Zijian Ding
Yizhou Sun
Jason Cong
18
0
0
29 Apr 2025
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi
Ryotaro Kawata
Naoki Nishikawa
Kazusato Oko
Shoichiro Yamaguchi
Sosuke Kobayashi
Seiya Tokui
K. Hayashi
Daisuke Okanohara
Taiji Suzuki
AI4CE
30
0
0
24 Apr 2025
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao
Yu Yang
Y. Fu
Xin Dong
Dan Su
...
Hongxu Yin
M. Patwary
Yingyan
Jan Kautz
Pavlo Molchanov
33
0
0
17 Apr 2025
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Tianyi Bai
Ling Yang
Zhen Hao Wong
Jiahui Peng
Xinlin Zhuang
...
Lijun Wu
Jiantao Qiu
Wentao Zhang
Binhang Yuan
Conghui He
LLMAG
23
1
0
10 Oct 2024
1