Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

Recent advancements in integrating large language models (LLMs) with tools have allowed the models to interact with real-world environments. However, these tool-augmented LLMs often encounter incomplete scenarios when users provide partial information or the necessary tools are unavailable. Recognizing and managing such scenarios is crucial for LLMs to ensure their reliability, but this exploration remains understudied. This study examines whether LLMs can identify incomplete conditions and appropriately determine when to refrain from using tools. To quantitatively evaluate this capability, we construct a new benchmark dataset where instances are systematically altered to simulate the ambiguous and incomplete conditions common in real-world interactions. Our experiments reveal that even state-of-the-art LLMs often struggle to identify these conditions, attempting to use tools without sufficient information or when the correct tool is unavailable. To better understand these limitations, we conduct a detailed behavioral analysis across various conditions, including implicit evaluation and scenarios where models receive feedback from previous tool invocations. Based on this analysis, we propose a novel prompting-based reasoning strategy that explicitly instructs models to assess the sufficiency of information and the availability of tools. Our proposed approach significantly enhances the models' ability to recognize incomplete conditions, resulting in more informed and contextually appropriate tool-use decisions. We believe our research contributes to advancing the reliability of LLMs, especially in real-world applications where incomplete or ambiguous information is common. Our dataset is available atthis https URL.
View on arXiv