Toward Safe, Trustworthy and Realistic Augmented Reality User Experience

As augmented reality (AR) becomes increasingly integrated into everyday life, ensuring the safety and trustworthiness of its virtual content is critical. Our research addresses the risks of task-detrimental AR content, particularly that which obstructs critical information or subtly manipulates user perception. We developed two systems, ViDDAR and VIM-Sense, to detect such attacks using vision-language models (VLMs) and multimodal reasoning modules. Building on this foundation, we propose three future directions: automated, perceptually aligned quality assessment of virtual content; detection of multimodal attacks; and adaptation of VLMs for efficient and user-centered deployment on AR devices. Overall, our work aims to establish a scalable, human-aligned framework for safeguarding AR experiences and seeks feedback on perceptual modeling, multimodal AR content implementation, and lightweight model adaptation.
View on arXiv