443
v1v2v3v4 (latest)

ViSymRe: Vision Multimodal Symbolic Regression

Main:17 Pages
18 Figures
Bibliography:4 Pages
25 Tables
Appendix:15 Pages
Abstract

Extracting interpretable equations from observational datasets to describe complex natural phenomena is one of the core goals of artificial intelligence. This field is known as symbolic regression (SR). In recent years, Transformer-based paradigms have become a new trend in SR, addressing the well-known problem of inefficient search. However, the modal heterogeneity between datasets and equations often hinders the convergence and generalization of these models. In this paper, we propose ViSymRe, a Vision Symbolic Regression framework, to explore the positive role of visual modality in enhancing the performance of Transformer-based SR paradigms. To overcome the challenge where the visual SR model is untrainable in high-dimensional scenarios, we present Multi-View Random Slicing (MVRS). By projecting multivariate equations into 2-D space using random affine transformations, MVRS avoids common defects in high-dimensional visualization, such as variable degradation, non-linear interaction missing, and exponentially increasing sampling complexity, enabling ViSymRe to be trained with low computational costs. To support dataset-only deployment of ViSymRe, we design a dual-vision pipeline architecture based on generative techniques, which reconstructs visual features directly from the datasets via an auxiliary Visual Decoder and automatically suppresses the attention weights of reconstruction noise through a proposed Biased Cross-Attention feature fusion module, ensuring that subsequent processes are not affected by noisy modalities. Ablation studies demonstrate the positive contribution of visual modality to improving model convergence level and enhancing various SR metrics. Furthermore, evaluation results on mainstream benchmarks indicate that ViSymRe achieves competitive performance compared to baselines, particularly in low-complexity and rapid-inference scenarios.

View on arXiv
Comments on this paper