Neuronal Activation States as Sample Embeddings for Data Selection in Task-Specific Instruction Tuning

19 March 2025

Abstract

Task-specific instruction tuning enhances the performance of large language models (LLMs) on specialized tasks, yet efficiently selecting relevant data for this purpose remains a challenge. Inspired by neural coactivation in the human brain, we propose a novel data selection method called NAS, which leverages neuronal activation states as embeddings for samples in the feature space. Extensive experiments show that NAS outperforms classical data selection methods in terms of both effectiveness and robustness across different models, datasets, and selection ratios.

View on arXiv

@article{ma2025_2503.15573,
  title={ Neuronal Activation States as Sample Embeddings for Data Selection in Task-Specific Instruction Tuning },
  author={ Da Ma and Gonghu Shang and Zhi Chen and Libo Qin and Yijie Luo and Lei Pan and Shuai Fan and Lu Chen and Kai Yu },
  journal={arXiv preprint arXiv:2503.15573},
  year={ 2025 }
}

Comments on this paper