80

Training Transformers for Mesh-Based Simulations

Main:13 Pages
21 Figures
Bibliography:5 Pages
4 Tables
Appendix:8 Pages
Abstract

Simulating physics using Graph Neural Networks (GNNs) is predominantly driven by message-passing architectures, which face challenges in scaling and efficiency, particularly in handling large, complex meshes. These architectures have inspired numerous enhancements, including multigrid approaches and KK-hop aggregation (using neighbours of distance KK), yet they often introduce significant complexity and suffer from limited in-depth investigations. In response to these challenges, we propose a novel Graph Transformer architecture that leverages the adjacency matrix as an attention mask. The proposed approach incorporates innovative augmentations, including Dilated Sliding Windows and Global Attention, to extend receptive fields without sacrificing computational efficiency. Through extensive experimentation, we evaluate model size, adjacency matrix augmentations, positional encoding and KK-hop configurations using challenging 3D computational fluid dynamics (CFD) datasets. We also train over 60 models to find a scaling law between training FLOPs and parameters. The introduced models demonstrate remarkable scalability, performing on meshes with up to 300k nodes and 3 million edges. Notably, the smallest model achieves parity with MeshGraphNet while being 7×7\times faster and 6×6\times smaller. The largest model surpasses the previous state-of-the-art by 38.838.8\% on average and outperforms MeshGraphNet by 5252\% on the all-rollout RMSE, while having a similar training speed. Code and datasets are available atthis https URL.

View on arXiv
Comments on this paper