ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models
Main:9 Pages
14 Figures
Bibliography:2 Pages
16 Tables
Appendix:16 Pages
Abstract
The increasing reliance on generative AI models has accelerated the generation rate of synthetic data, with some projections suggesting that most available new data for training could be machine-generated by 2030. This shift to a mainly synthetic content presents a critical challenge: repeated training in synthetic data leads to a phenomenon known as model collapse, where model performance degrades over generations of training, eventually rendering the models ineffective. Although prior studies have explored the causes and detection of model collapse, existing mitigation strategies remain limited.
View on arXivComments on this paper
