From Next Token Prediction to (STRIPS) World Models -- Preliminary Results

16 September 2025

Carlos Núñez-Molina

Vicenç Gómez

Héctor Geffner

ArXiv (abs)PDF HTML Github (21★)

Main:7 Pages

3 Figures

Bibliography:1 Pages

6 Tables

Appendix:10 Pages

Abstract

We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.

View on arXiv

Comments on this paper