Training Transformers for Mesh-Based Simulations

25 August 2025

Main:13 Pages

21 Figures

Bibliography:5 Pages

4 Tables

Appendix:8 Pages

Abstract

Simulating physics using Graph Neural Networks (GNNs) is predominantly driven by message-passing architectures, which face challenges in scaling and efficiency, particularly in handling large, complex meshes. These architectures have inspired numerous enhancements, including multigrid approaches and $K$ -hop aggregation (using neighbours of distance $K$ ), yet they often introduce significant complexity and suffer from limited in-depth investigations. In response to these challenges, we propose a novel Graph Transformer architecture that leverages the adjacency matrix as an attention mask. The proposed approach incorporates innovative augmentations, including Dilated Sliding Windows and Global Attention, to extend receptive fields without sacrificing computational efficiency. Through extensive experimentation, we evaluate model size, adjacency matrix augmentations, positional encoding and $K$ -hop configurations using challenging 3D computational fluid dynamics (CFD) datasets. We also train over 60 models to find a scaling law between training FLOPs and parameters. The introduced models demonstrate remarkable scalability, performing on meshes with up to 300k nodes and 3 million edges. Notably, the smallest model achieves parity with MeshGraphNet while being $7\times$ faster and $6\times$ smaller. The largest model surpasses the previous state-of-the-art by $38.8$ \% on average and outperforms MeshGraphNet by $52$ \% on the all-rollout RMSE, while having a similar training speed. Code and datasets are available atthis https URL.

View on arXiv

Comments on this paper