37

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Yebin Yang
Di Wen
Lei Qi
Weitong Kong
Junwei Zheng
Ruiping Liu
Yufan Chen
Chengzhi Wu
Kailun Yang
Yuqian Fu
Danda Pani Paudel
Luc Van Gool
Kunyu Peng
Main:13 Pages
9 Figures
Bibliography:4 Pages
10 Tables
Appendix:11 Pages
Abstract

Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion consistency and edit fidelity, achieving state-of-the-art TMME performance. The dataset and code will be released atthis https URL.

View on arXiv
Comments on this paper