ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.14354
21
7

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

26 August 2024
Daoguang Zan
Zhirong Huang
Ailun Yu
Shaoxin Lin
Yifan Shi
Wei Liu
Dong Chen
Zongshuai Qi
Hao Yu
Lei Yu
Dezhi Ran
Muhan Zeng
Bo Shen
Pan Bian
Guangtai Liang
Bei Guan
Pengjie Huang
Tao Xie
Yongji Wang
Qianxiang Wang
ArXivPDFHTML
Abstract

GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in industry. As a first step toward multilingual support, we have developed a Java version of SWE-bench, called SWE-bench-java. We have publicly released the dataset, along with the corresponding Docker-based evaluation environment and leaderboard, which will be continuously maintained and updated in the coming months. To verify the reliability of SWE-bench-java, we implement a classic method SWE-agent and test several powerful LLMs on it. As is well known, developing a high-quality multi-lingual benchmark is time-consuming and labor-intensive, so we welcome contributions through pull requests or collaboration to accelerate its iteration and refinement, paving the way for fully automated programming.

View on arXiv
Comments on this paper