26

TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

Yuanzhi Liang
Xuanér Wu
Yirui Liu
Yijie Fang
Yizhen Fan
Ke Hao
Rui Li
Ruiying Liu
Ziqi Ni
Peng Yu
Yanbo Wang
Haibin Huang
Qizhen Weng
Chi Zhang
Xuelong Li
Main:29 Pages
17 Figures
Bibliography:6 Pages
7 Tables
Abstract

Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal horizons. This report presents a systematical post-training framework that organizes supervised policy shaping, reward-driven reinforcement learning, and preference-based refinement into a single stability-constrained optimization stack. The framework is designed around practical video-generation constraints, including high rollout cost, temporally compounding failure modes, and feedback that is heterogeneous, uncertain, and often weakly discriminative. By treating optimization as a staged, diagnostic-driven process rather than a collection of isolated tricks, the report summarizes a cohesive recipe for improving perceptual fidelity, temporal coherence, and prompt adherence while preserving the controllability established at initialization. The resulting framework provides a clear blueprint for building scalable post-training pipelines that remain stable, extensible, and effective in real-world deployment settings.

View on arXiv
Comments on this paper