Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens

Human bodily movements convey critical insights into action intentions and cognitive processes, yet existing multimodal systems primarily focused on understanding human motion via language, vision, and audio, which struggle to capture the dynamic forces and torques inherent in 3D motion. Inertial measurement units (IMUs) present a promising alternative, offering lightweight, wearable, and privacy-conscious motion sensing. However, processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift, limiting their utility for long-term real-time motion capture (MoCap), and more importantly, online motion analysis. To address these challenges, we introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models (LLMs) for interactive motion capture and behavioral analysis.
View on arXiv@article{shan2025_2502.16175, title={ Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens }, author={ Ziwei Shan and Yaoyu He and Chengfeng Zhao and Jiashen Du and Jingyan Zhang and Qixuan Zhang and Jingyi Yu and Lan Xu }, journal={arXiv preprint arXiv:2502.16175}, year={ 2025 } }