ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11962
22
0

CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World

17 May 2025
Zoya Volovikova
G. Gorbov
Petr Kuderov
Aleksandr I. Panov
A. Skrynnik
ArXivPDFHTML
Abstract

Following instructions in real-world conditions requires the ability to adapt to the world's volatility and entanglement: the environment is dynamic and unpredictable, instructions can be linguistically complex with diverse vocabulary, and the number of possible goals an agent may encounter is vast. Despite extensive research in this area, most studies are conducted in static environments with simple instructions and a limited vocabulary, making it difficult to assess agent performance in more diverse and challenging settings. To address this gap, we introduce CrafText, a benchmark for evaluating instruction following in a multimodal environment with diverse instructions and dynamic interactions. CrafText includes 3,924 instructions with 3,423 unique words, covering Localization, Conditional, Building, and Achievement tasks. Additionally, we propose an evaluation protocol that measures an agent's ability to generalize to novel instruction formulations and dynamically evolving task configurations, providing a rigorous test of both linguistic understanding and adaptive decision-making.

View on arXiv
@article{volovikova2025_2505.11962,
  title={ CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World },
  author={ Zoya Volovikova and Gregory Gorbov and Petr Kuderov and Aleksandr I. Panov and Alexey Skrynnik },
  journal={arXiv preprint arXiv:2505.11962},
  year={ 2025 }
}
Comments on this paper