TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

7 February 2026

Yuanzhi Liang

Xuanér Wu

Yirui Liu

Yijie Fang

Yizhen Fan

Ke Hao

Rui Li

Ruiying Liu

Ziqi Ni

Peng Yu

Yanbo Wang

Haibin Huang

Qizhen Weng

Chi Zhang

Xuelong Li

VGen

ArXiv (abs)PDF HTML Github

Main:29 Pages

17 Figures

Bibliography:6 Pages

7 Tables

Abstract

Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal horizons. This report presents a systematical post-training framework that organizes supervised policy shaping, reward-driven reinforcement learning, and preference-based refinement into a single stability-constrained optimization stack. The framework is designed around practical video-generation constraints, including high rollout cost, temporally compounding failure modes, and feedback that is heterogeneous, uncertain, and often weakly discriminative. By treating optimization as a staged, diagnostic-driven process rather than a collection of isolated tricks, the report summarizes a cohesive recipe for improving perceptual fidelity, temporal coherence, and prompt adherence while preserving the controllability established at initialization. The resulting framework provides a clear blueprint for building scalable post-training pipelines that remain stable, extensible, and effective in real-world deployment settings.

View on arXiv

Comments on this paper