Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which hinders agent performance in the current agent paradigm that does not support immediate user intervention. To address this issue, we introduce a task that incorporates interactive information completion capabilities within GUI agents. We developed the dataset with GUI follow-up question-answer pairs, alongside a method to benchmark this new capability. Our results show that agents equipped with the ability to ask GUI follow-up questions can fully recover their performance when faced with ambiguous user tasks.
View on arXiv@article{cheng2025_2503.24180, title={ Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up }, author={ Ziming Cheng and Zhiyuan Huang and Junting Pan and Zhaohui Hou and Mingjie Zhan }, journal={arXiv preprint arXiv:2503.24180}, year={ 2025 } }