ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.16986
  4. Cited By
Harnessing Diversity for Important Data Selection in Pretraining Large
  Language Models

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

25 September 2024
Chi Zhang
Huaping Zhong
Kuan Zhang
Chengliang Chai
Rui Wang
Xinlin Zhuang
Tianyi Bai
Jiantao Qiu
Lei Cao
Ju Fan
Ye Yuan
Guoren Wang
Conghui He
    TDI
ArXivPDFHTML

Papers citing "Harnessing Diversity for Important Data Selection in Pretraining Large Language Models"

4 / 4 papers shown
Title
LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning
LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning
Neha Prakriya
Zijian Ding
Yizhou Sun
Jason Cong
16
0
0
29 Apr 2025
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi
Ryotaro Kawata
Naoki Nishikawa
Kazusato Oko
Shoichiro Yamaguchi
Sosuke Kobayashi
Seiya Tokui
K. Hayashi
Daisuke Okanohara
Taiji Suzuki
AI4CE
30
0
0
24 Apr 2025
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao
Yu Yang
Y. Fu
Xin Dong
Dan Su
...
Hongxu Yin
M. Patwary
Yingyan
Jan Kautz
Pavlo Molchanov
33
0
0
17 Apr 2025
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Tianyi Bai
Ling Yang
Zhen Hao Wong
Jiahui Peng
Xinlin Zhuang
...
Lijun Wu
Jiantao Qiu
Wentao Zhang
Binhang Yuan
Conghui He
LLMAG
23
1
0
10 Oct 2024
1