Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding

Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this work, we first bridge this gap for flat-surface writing scenarios: by fine-tuning LLMs with a self-collected dataset and few-shot learning, we achieved up to a 129x improvement on 2D data. To extend this to 3D scenarios, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving the spatiotemporal information for robust letter prediction. Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for fine-grained HAR.
View on arXiv@article{xu2025_2504.02878, title={ Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding }, author={ Lilin Xu and Kaiyuan Hou and Xiaofan Jiang }, journal={arXiv preprint arXiv:2504.02878}, year={ 2025 } }