ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16563
44
0
v1v2 (latest)

A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices

22 May 2025
Chen Gong
Rui Xing
Zhenzhe Zheng
Fan Wu
ArXiv (abs)PDFHTML
Main:9 Pages
11 Figures
Bibliography:2 Pages
1 Tables
Appendix:1 Pages
Abstract

The demand for machine learning (ML) model training on edge devices is escalating due to data privacy and personalized service needs. However, we observe that current on-device model training is hampered by the under-utilization of on-device data, due to low training throughput, limited storage and diverse data importance. To improve data resource utilization, we propose a two-stage data selection framework {\sf Titan} to select the most important data batch from streaming data for model training with guaranteed efficiency and effectiveness. Specifically, in the first stage, {\sf Titan} filters out a candidate dataset with potentially high importance in a coarse-grained this http URL the second stage of fine-grained selection, we propose a theoretically optimal data selection strategy to identify the data batch with the highest model performance improvement to current training round. To further enhance time-and-resource efficiency, {\sf Titan} leverages a pipeline to co-execute data selection and model training, and avoids resource conflicts by exploiting idle computing resources. We evaluate {\sf Titan} on real-world edge devices and three representative edge computing tasks with diverse models and data modalities. Empirical results demonstrate that {\sf Titan} achieves up to 43%43\%43% reduction in training time and 6.2%6.2\%6.2% increase in final accuracy with minor system overhead, such as data processing delay, memory footprint and energy consumption.

View on arXiv
@article{gong2025_2505.16563,
  title={ A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices },
  author={ Chen Gong and Rui Xing and Zhenzhe Zheng and Fan Wu },
  journal={arXiv preprint arXiv:2505.16563},
  year={ 2025 }
}
Comments on this paper