Out-of-distribution generalisation is hard: evidence from ARC-like tasks

14 May 2025

George Dimitriadis. Spyridon Samothrakis

Abstract

Out-of-distribution (OOD) generalisation is considered a hallmark of human and animal intelligence. To achieve OOD through composition, a system must discover the environment-invariant properties of experienced input-output mappings and transfer them to novel inputs. This can be realised if an intelligent system can identify appropriate, task-invariant, and composable input features, as well as the composition methods, thus allowing it to act based not on the interpolation between learnt data points but on the task-invariant composition of those features. We propose that in order to confirm that an algorithm does indeed learn compositional structures from data, it is not enough to just test on an OOD setup, but one also needs to confirm that the features identified are indeed compositional. We showcase this by exploring two tasks with clearly defined OOD metrics that are not OOD solvable by three commonly used neural networks: a Multi-Layer Perceptron (MLP), a Convolutional Neural Network (CNN), and a Transformer. In addition, we develop two novel network architectures imbued with biases that allow them to be successful in OOD scenarios. We show that even with correct biases and almost perfect OOD performance, an algorithm can still fail to learn the correct features for compositional generalisation.

View on arXiv

@article{samothrakis2025_2505.09716,
  title={ Out-of-distribution generalisation is hard: evidence from ARC-like tasks },
  author={ George Dimitriadis. Spyridon Samothrakis },
  journal={arXiv preprint arXiv:2505.09716},
  year={ 2025 }
}

Comments on this paper