7

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Minh-Quan Le
Yuanzhi Zhu
Vicky Kalogeiton
Dimitris Samaras
Main:8 Pages
12 Figures
Bibliography:4 Pages
6 Tables
Appendix:5 Pages
Abstract

Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose NewtonRewards\texttt{NewtonRewards}, the first physics-grounded post-training framework for video generation based on verifiable rewards\textit{verifiable rewards}. Instead of relying on human or VLM feedback, NewtonRewards\texttt{NewtonRewards} extracts measurable proxies\textit{measurable proxies} from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate NewtonRewards\texttt{NewtonRewards} on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, NewtonBench-60K\texttt{NewtonBench-60K}. Across all primitives in visual and physics metrics, NewtonRewards\texttt{NewtonRewards} consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.

View on arXiv
Comments on this paper