182
v1v2 (latest)

Mathematical Derivation Graphs: A Relation Extraction Task in STEM Manuscripts

Main:10 Pages
11 Figures
Bibliography:1 Pages
4 Tables
Appendix:18 Pages
Abstract

Recent advances in natural language processing (NLP), particularly with the emergence of large language models (LLMs), have significantly enhanced the field of textual analysis. However, while these developments have yielded substantial progress in analyzing natural language text, applying analysis to mathematical equations and their relationships within texts has produced mixed results. This paper takes the initial steps in expanding the problem of relation extraction towards understanding the dependency relationships between mathematical expressions in STEM articles. The authors construct the Mathematical Derivation Graphs Dataset (MDGD), sourced from a random sampling of the arXiv corpus, containing an analysis of 107107 published STEM manuscripts with over 20002000 manually labeled inter-equation dependency relationships, resulting in a new object referred to as a derivation graph that summarizes the mathematical content of the manuscript. The authors exhaustively evaluate analytical and machine learning (ML) based models to assess their capability to identify and extract the derivation relationships for each article and compare the results with the ground truth. The authors show that the best tested LLMs achieve F1F_1 scores of 45%52%\sim45\%-52\%, and attempt to improve their performance by combining them with analytic algorithms and other methods.

View on arXiv
Comments on this paper