Large Language Model-based Augmentation for Imbalanced Node Classification on Text-Attributed Graphs

Node classification on graphs often suffers from class imbalance, leading to biased predictions and significant risks in real-world applications. While data-centric solutions have been explored, they largely overlook Text-Attributed Graphs (TAGs) and the potential of using rich textual semantics to improve the classification of minority nodes. Given this gap, we propose Large Language Model-based Augmentation on Text-Attributed Graphs (LA-TAG), a novel framework that leverages Large Language Models (LLMs) to handle imbalanced node classification. Specifically, we develop prompting strategies inspired by interpolation to synthesize textual node attributes. Additionally, to effectively integrate synthetic nodes into the graph structure, we introduce a textual link predictor that connects the generated nodes to the original graph, preserving structural and contextual information. Experiments across various datasets and evaluation metrics demonstrate that LA-TAG outperforms existing textual augmentation and graph imbalance learning methods, emphasizing the efficacy of our approach in addressing class imbalance in TAGs.
View on arXiv