LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors

Wearables generate rich motion data, yet current systems only classify what happened - failing to support natural questions about why it happened or what it means. We introduce LLaSA (Large Language and Sensor Assistant), a compact 13B model that enables ask-anything, open-ended question answering grounded in raw IMU data. LLaSA supports conversational, context-aware reasoning - explaining the causes of sensor-detected behaviors and answering free-form questions in real-world scenarios. It is tuned for scientific accuracy, coherence, and response reliability. To advance this new task of sensor-based QA, we release three large-scale datasets: SensorCaps, OpenSQA, and Tune-OpenSQA. Together, these resources define a new benchmark for sensor-language models. LLaSA consistently produces interpretable, causal answers and outperforms commercial LLMs across both public and real-world settings. Our code repository and datasets can be found atthis https URL.
View on arXiv@article{imran2025_2406.14498, title={ LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors }, author={ Sheikh Asif Imran and Mohammad Nur Hossain Khan and Subrata Biswas and Bashima Islam }, journal={arXiv preprint arXiv:2406.14498}, year={ 2025 } }