InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

13 March 2026

Yebin Yang

Di Wen

Lei Qi

Weitong Kong

Junwei Zheng

Ruiping Liu

Yufan Chen

Chengzhi Wu

Kailun Yang

Yuqian Fu

Danda Pani Paudel

Luc Van Gool

Kunyu Peng

DiffM

VGen

ArXiv (abs)PDF HTML Github (5★)

Main:13 Pages

9 Figures

Bibliography:4 Pages

10 Tables

Appendix:11 Pages

Abstract

Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion consistency and edit fidelity, achieving state-of-the-art TMME performance. The dataset and code will be released atthis https URL.

View on arXiv

Comments on this paper