v1v2 (latest)

Towards Real-time Adaptation of Embodied Agent in Human-Robot Collaboration

30 November 2024

Shipeng Liu

Boshen Zhang

Zhehui Huang

ArXiv (abs)PDF HTML Github

Main:6 Pages

8 Figures

4 Tables

Appendix:7 Pages

Abstract

Large Language Models (LLMs) have opened transformative possibilities for human-robot collaboration. However, enabling real-time collaboration requires both low latency and robust reasoning, and most LLMs suffer from high latency. To address this gap, we first propose a fine-grained benchmark that explicitly assesses agents' proactive adaptability and temporal responsiveness in the Overcooked-AI environment. Based on evaluation results, we propose MonTA (Monitor-then-Adapt), a hierarchical framework inspired by cognitive science research. MonTA contains three key modules: a lightweight Monitor that operates at high frequency (7 Hz) to detect adaptation needs, and two proficient Adapters for subtask and path adaptation reasoning that provide instructions to humans at a lower frequency. Our results demonstrate that MonTA significantly outperforms baseline agents on our proposed benchmark, achieving superior performance across layouts with varying teaming fluency. User studies confirm the high reasonableness of adaptation plans and consistent language instructions provided by our framework to humans.

View on arXiv

Comments on this paper