96

HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics

Main:7 Pages
7 Figures
Bibliography:3 Pages
7 Tables
Abstract

Semantic parsing of long documents remains challenging due to quadratic growth in pairwise composition and memory requirements. We introduce \textbf{Hierarchical Segment-Graph Memory (HSGM)}, a novel framework that decomposes an input of length NN into MM meaningful segments, constructs \emph{Local Semantic Graphs} on each segment, and extracts compact \emph{summary nodes} to form a \emph{Global Graph Memory}. HSGM supports \emph{incremental updates} -- only newly arrived segments incur local graph construction and summary-node integration -- while \emph{Hierarchical Query Processing} locates relevant segments via top-KK retrieval over summary nodes and then performs fine-grained reasoning within their local graphs.Theoretically, HSGM reduces worst-case complexity from O(N2)O(N^2) to O ⁣(Nk+(N/k)2)O\!\left(N\,k + (N/k)^2\right), with segment size kNk \ll N, and we derive Frobenius-norm bounds on the approximation error introduced by node summarization and sparsification thresholds. Empirically, on three benchmarks -- long-document AMR parsing, segment-level semantic role labeling (OntoNotes), and legal event extraction -- HSGM achieves \emph{2--4×\times inference speedup}, \emph{>60%>60\% reduction} in peak memory, and \emph{95%\ge 95\%} of baseline accuracy. Our approach unlocks scalable, accurate semantic modeling for ultra-long texts, enabling real-time and resource-constrained NLP applications.

View on arXiv
Comments on this paper