In the medical field, the limited availability of large-scale datasets and labor-intensive annotation processes hinder the performance of deep models. Diffusion-based generative augmentation approaches present a promising solution to this issue, having been proven effective in advancing downstream medical recognition tasks. Nevertheless, existing works lack sufficient semantic and sequential steerability for challenging video/3D sequence generation, and neglect quality control of noisy synthesized samples, resulting in unreliable synthetic databases and severely limiting the performance of downstream tasks. In this work, we present Ctrl-GenAug, a novel and general generative augmentation framework that enables highly semantic- and sequential-customized sequence synthesis and suppresses incorrectly synthesized samples, to aid medical sequence classification. Specifically, we first design a multimodal conditions-guided sequence generator for controllably synthesizing diagnosis-promotive samples. A sequential augmentation module is integrated to enhance the temporal/stereoscopic coherence of generated samples. Then, we propose a noisy synthetic data filter to suppress unreliable cases at semantic and sequential levels. Extensive experiments on 3 medical datasets, using 11 networks trained on 3 paradigms, comprehensively analyze the effectiveness and generality of Ctrl-GenAug, particularly in underrepresented high-risk populations and out-domain conditions.

View on arXiv

@article{zhou2025_2409.17091,
  title={ Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification },
  author={ Xinrui Zhou and Yuhao Huang and Haoran Dou and Shijing Chen and Ao Chang and Jia Liu and Weiran Long and Jian Zheng and Erjiao Xu and Jie Ren and Ruobing Huang and Jun Cheng and Wufeng Xue and Dong Ni },
  journal={arXiv preprint arXiv:2409.17091},
  year={ 2025 }
}

Comments on this paper