TextMesh4D: High-Quality Text-to-4D Mesh Generation

1 July 2025

Sisi Dai

Xinxin Su

Boyan Wan

Ruizhen Hu

Kai Xu

ArXiv (abs)PDF HTML

Main:8 Pages

6 Figures

Bibliography:3 Pages

2 Tables

Abstract

Recent advancements in diffusion generative models significantly advanced image, video, and 3D content creation from user-provided text prompts. However, the challenging problem of dynamic 3D content generation (text-to-4D) with diffusion guidance remains largely unexplored. In this paper, we introduce TextMesh4D, a novel framework for high-quality text-to-4D generation. Our approach leverages per-face Jacobians as a differentiable mesh representation and decomposes 4D generation into two stages: static object creation and dynamic motion synthesis. We further propose a flexibility-rigidity regularization term to stabilize Jacobian optimization under video diffusion priors, ensuring robust geometric performance. Experiments demonstrate that TextMesh4D achieves state-of-the-art results in terms of temporal consistency, structural fidelity, and visual realism. Moreover, TextMesh4D operates with a low GPU memory overhead-requiring only a single 24GB GPU-offering a cost-effective yet high-quality solution for text-driven 4D mesh generation. The code will be released to facilitate future research in text-to-4D generation.

View on arXiv

@article{dai2025_2506.24121,
  title={ TextMesh4D: High-Quality Text-to-4D Mesh Generation },
  author={ Sisi Dai and Xinxin Su and Boyan Wan and Ruizhen Hu and Kai Xu },
  journal={arXiv preprint arXiv:2506.24121},
  year={ 2025 }
}

Comments on this paper