31
0

ADAPT: Actively Discovering and Adapting to Preferences for any Task

Abstract

Assistive agents should be able to perform under-specified long-horizon tasks while respecting user preferences. We introduce Actively Discovering and Adapting to Preferences for any Task (ADAPT) -- a benchmark designed to evaluate agents' ability to adhere to user preferences across various household tasks through active questioning. Next, we propose Reflection-DPO, a novel training approach for adapting large language models (LLMs) to the task of active questioning. Reflection-DPO finetunes a 'student' LLM to follow the actions of a privileged 'teacher' LLM, and optionally ask a question to gather necessary information to better predict the teacher action. We find that prior approaches that use state-of-the-art LLMs fail to sufficiently follow user preferences in ADAPT due to insufficient questioning and poor adherence to elicited preferences. In contrast, Reflection-DPO achieves a higher rate of satisfying user preferences, outperforming a zero-shot chain-of-thought baseline by 6.1% on unseen users.

View on arXiv
@article{patel2025_2504.04040,
  title={ ADAPT: Actively Discovering and Adapting to Preferences for any Task },
  author={ Maithili Patel and Xavier Puig and Ruta Desai and Roozbeh Mottaghi and Sonia Chernova and Joanne Truong and Akshara Rai },
  journal={arXiv preprint arXiv:2504.04040},
  year={ 2025 }
}
Comments on this paper