5
0

An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding

Abstract

The rapid advancements in Large Vision Language Models (LVLMs) offer the potential to surpass conventional labeling by generating richer, more detailed descriptions of on-device human behavior understanding (HBU) in low-resolution vision systems, such as depth, thermal, and infrared. However, existing large vision language model (LVLM) approaches are unable to understand low-resolution data well as they are primarily designed for high-resolution data, such as RGB images. A quick fixing approach is to caption a large amount of low-resolution data, but it requires a significant amount of labor-intensive annotation efforts. In this paper, we propose a novel, labor-saving system, Llambda, designed to support low-resolution HBU. The core idea is to leverage limited labeled data and a large amount of unlabeled data to guide LLMs in generating informative captions, which can be combined with raw data to effectively fine-tune LVLM models for understanding low-resolution videos in HBU. First, we propose a Contrastive-Oriented Data Labeler, which can capture behavior-relevant information from long, low-resolution videos and generate high-quality pseudo labels for unlabeled data via contrastive learning. Second, we propose a Physical-Knowledge Guided Captioner, which utilizes spatial and temporal consistency checks to mitigate errors in pseudo labels. Therefore, it can improve LLMs' understanding of sequential data and then generate high-quality video captions. Finally, to ensure on-device deployability, we employ LoRA-based efficient fine-tuning to adapt LVLMs for low-resolution data. We evaluate Llambda using a region-scale real-world testbed and three distinct low-resolution datasets, and the experiments show that Llambda outperforms several state-of-the-art LVLM systems up to 40.03%40.03\% on average Bert-Score.

View on arXiv
@article{jiang2025_2505.01743,
  title={ An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding },
  author={ Siyang Jiang and Bufang Yang and Lilin Xu and Mu Yuan and Yeerzhati Abudunuer and Kaiwei Liu and Liekang Zeng and Hongkai Chen and Zhenyu Yan and Xiaofan Jiang and Guoliang Xing },
  journal={arXiv preprint arXiv:2505.01743},
  year={ 2025 }
}
Comments on this paper