ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.06212
28
0

GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs

8 March 2025
Yue Jin
Yongchao Liu
Chuntao Hong
    GNN
ArXivPDFHTML
Abstract

Graph-based computations are crucial in a wide range of applications, where graphs can scale to trillions of edges. To enable efficient training on such large graphs, mini-batch subgraph sampling is commonly used, which allows training without loading the entire graph into memory. However, existing solutions face significant trade-offs: online subgraph generation, as seen in frameworks like DGL and PyG, is limited to a single machine, resulting in severe performance bottlenecks, while offline precomputed subgraphs, as in GraphGen, improve sampling efficiency but introduce large storage overhead and high I/O costs during training. To address these challenges, we propose \textbf{GraphGen+}, an integrated framework that synchronizes distributed subgraph generation with in-memory graph learning, eliminating the need for external storage while significantly improving efficiency. GraphGen+ achieves a \textbf{27×\times×} speedup in subgraph generation compared to conventional SQL-like methods and a \textbf{1.3×\times×} speedup over GraphGen, supporting training on 1 million nodes per iteration and removing the overhead associated with precomputed subgraphs, making it a scalable and practical solution for industry-scale graph learning.

View on arXiv
@article{jin2025_2503.06212,
  title={ GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs },
  author={ Yue Jin and Yongchao Liu and Chuntao Hong },
  journal={arXiv preprint arXiv:2503.06212},
  year={ 2025 }
}
Comments on this paper