Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. Synthetic data emerge as a viable solution, addressing these challenges by offering scalable, unbiased, and high-quality alternatives. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
View on arXiv@article{liu2025_2503.11411, title={ Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models }, author={ Xu Liu and Taha Aksu and Juncheng Liu and Qingsong Wen and Yuxuan Liang and Caiming Xiong and Silvio Savarese and Doyen Sahoo and Junnan Li and Chenghao Liu }, journal={arXiv preprint arXiv:2503.11411}, year={ 2025 } }