ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.10179
99
35

Scaling Instructable Agents Across Many Simulated Worlds

13 March 2024
Sima Team
Maria Abi Raad
Arun Ahuja
Catarina Barros
F. Besse
Andrew Bolt
Adrian Bolton
Bethanie Brownfield
Gavin Buttimore
Max Cant
Sarah Chakera
Stephanie C. Y. Chan
Jeff Clune
Adrian Collister
Vikki Copeman
Alex Cullum
Ishita Dasgupta
D. Cesare
Julia Di Trapani
Yani Donchev
Emma Dunleavy
Martin Engelcke
Ryan Faulkner
Frankie Garcia
C. Gbadamosi
Zhitao Gong
Lucy Gonzales
Kshitij Gupta
Karol Gregor
Arne Olav Hallingstad
Tim Harley
Sam Haves
Felix Hill
Ed Hirst
Drew A. Hudson
Jony Hudson
Steph Hughes-Fitt
Danilo Jimenez Rezende
Mimi Jasarevic
Laura Kampis
Rosemary Ke
Thomas Keck
Junkyung Kim
Oscar Knagg
Kavya Kopparapu
Andrew Kyle Lampinen
Shane Legg
Alexander Lerchner
Marjorie Limont
Yulan Liu
Maria Loks-Thompson
Joseph Marino
Kathryn Martin Cussons
Loic Matthey
S. Mcloughlin
Piermaria Mendolicchio
Hamza Merzic
Anna Mitenkova
Alexandre Moufarek
Valeria Oliveira
Yanko Oliveira
Hannah Openshaw
Renke Pan
Aneesh Pappu
Alex Platonov
Ollie Purkiss
David P. Reichert
John Reid
Pierre Harvey Richemond
Tyson Roberts
Giles Ruscoe
Jaume Sanchez Elias
Tasha Sandars
Daniel P. Sawyer
Tim Scholtes
Guy Simmons
Daniel Slater
Hubert Soyer
Heiko Strathmann
Peter Stys
Allison C. Tam
Denis Teplyashin
Tayfun Terzi
Davide Vercelli
Bojan Vujatovic
Marcus Wainwright
Jane X. Wang
Zhengdong Wang
Daan Wierstra
Duncan Williams
Nathaniel Wong
Sarah York
Nick Young
    LM&Ro
ArXivPDFHTML
Abstract

Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.

View on arXiv
Comments on this paper