294
v1v2 (latest)

Extracting Moore Machines from Transformers using Queries and Counterexamples

International Symposium on Intelligent Data Analysis (IDA), 2024
Main:10 Pages
3 Figures
Bibliography:3 Pages
2 Tables
Abstract

Fuelled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn from data. Nonetheless, existing results remain hard to compare due to methodological differences. To address this, we construct finite state automata as high-level abstractions of transformers trained on regular languages using queries and counterexamples. Concretely, we extract Moore machines, as many training tasks used in literature can be mapped onto them. We demonstrate the usefulness of this approach by studying positive-only learning and the sequence accuracy measure in detail.

View on arXiv
Comments on this paper