21
1

LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors

Abstract

Wearables generate rich motion data, yet current systems only classify what happened - failing to support natural questions about why it happened or what it means. We introduce LLaSA (Large Language and Sensor Assistant), a compact 13B model that enables ask-anything, open-ended question answering grounded in raw IMU data. LLaSA supports conversational, context-aware reasoning - explaining the causes of sensor-detected behaviors and answering free-form questions in real-world scenarios. It is tuned for scientific accuracy, coherence, and response reliability. To advance this new task of sensor-based QA, we release three large-scale datasets: SensorCaps, OpenSQA, and Tune-OpenSQA. Together, these resources define a new benchmark for sensor-language models. LLaSA consistently produces interpretable, causal answers and outperforms commercial LLMs across both public and real-world settings. Our code repository and datasets can be found atthis https URL.

View on arXiv
@article{imran2025_2406.14498,
  title={ LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors },
  author={ Sheikh Asif Imran and Mohammad Nur Hossain Khan and Subrata Biswas and Bashima Islam },
  journal={arXiv preprint arXiv:2406.14498},
  year={ 2025 }
}
Comments on this paper