406

Weisfeiler and Lehman Go Measurement Modeling: Probing the Validity of the WL Test

Main:14 Pages
5 Figures
Bibliography:4 Pages
5 Tables
Appendix:29 Pages
Abstract

The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the kk-dimensional Weisfeiler-Lehman (kk-WL) test. In this paper, we uncover misalignments between practitioners' conceptualizations of expressive power and kk-WL through a systematic analysis of the reliability and validity of kk-WL. We further conduct a survey (n=18n = 18) of practitioners to surface their conceptualizations of expressive power and their assumptions about kk-WL. In contrast to practitioners' opinions, our analysis (which draws from graph theory and benchmark auditing) reveals that kk-WL does not guarantee isometry, can be irrelevant to real-world graph tasks, and may not promote generalization or trustworthiness. We argue for extensional definitions and measurement of expressive power based on benchmarks; we further contribute guiding questions for constructing such benchmarks, which is critical for progress in graph machine learning.

View on arXiv
Comments on this paper