ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03231
22
0

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

3 June 2025
Yajie Zhou
Jiajun Ruan
Eric S. Wang
Sadjad Fouladi
Francis Y. Yan
Kevin Hsieh
Zaoxing Liu
ArXiv (abs)PDFHTML
Main:10 Pages
8 Figures
Bibliography:5 Pages
3 Tables
Appendix:13 Pages
Abstract

Despite growing interest in domain-specific benchmarking of large language models (LLMs) and agents, current evaluations remain limited to static, small-scale datasets, especially in high-stakes tasks like network operations that demand reliability for deployments. We present NetPress, an automated benchmark generation framework for evaluating LLM agents in network applications. NetPress introduces a unified abstraction with state and action, enabling dynamic generation of diverse query sets along with corresponding ground truths. At runtime, users can specify benchmark configurations to generate millions of queries on the fly. In addition to dynamic benchmark construction, NetPress integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across correctness, safety, and latency. We instantiate NetPress on three representative applications, revealing interesting fine-grained differences in agent behavior that static, correctness-only benchmarks often miss. NetPress moves LLM evaluation toward realistic, scalable testing in infrastructure-centric domains, helping close the gap between benchmark performance and real-world deployment readiness. Code is available atthis https URL.

View on arXiv
@article{zhou2025_2506.03231,
  title={ NetPress: Dynamically Generated LLM Benchmarks for Network Applications },
  author={ Yajie Zhou and Jiajun Ruan and Eric S. Wang and Sadjad Fouladi and Francis Y. Yan and Kevin Hsieh and Zaoxing Liu },
  journal={arXiv preprint arXiv:2506.03231},
  year={ 2025 }
}
Comments on this paper