Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions

26 May 2025

Ming Yin

Main:9 Pages

15 Figures

Bibliography:4 Pages

3 Tables

Appendix:16 Pages

Abstract

In this short report, we present an automated pipeline tailored for the genomics domain and introduce \textit{Genome-Bench}, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw interactions into a reinforcement learning friendly multiple-choice questions format, supported by 3000+ high quality question answer pairs spanning foundational biology, experimental troubleshooting, tool usage, and beyond. To our knowledge, this is the first end-to-end pipeline for teaching LLMs to reason from scientific discussions, with promising potential for generalization across scientific domains beyond biology.

View on arXiv

@article{yin2025_2505.19501,
  title={ Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning },
  author={ Ming Yin and Yuanhao Qu and Ling Yang and Le Cong and Mengdi Wang },
  journal={arXiv preprint arXiv:2505.19501},
  year={ 2025 }
}

Comments on this paper