25
0
v1v2 (latest)

Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning

Main:9 Pages
15 Figures
Bibliography:4 Pages
3 Tables
Appendix:16 Pages
Abstract

We investigate how to teach large language models (LLMs) to perform scientific reasoning by leveraging expert discussions as a learning signal. Focusing on the genomics domain, we develop an automated pipeline to extract trainable data and introduce Genome-Bench, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw interactions into a reinforcement learning-friendly multiple-choice questions format, supported by 3000+ high-quality question-answer pairs spanning foundational biology, experimental troubleshooting, tool usage, and beyond. We fine-tune an LLM using RL with a rule-based reward signal derived from the synthetic MCQ dataset to enhance domain-specific reasoning. Our results show that reinforcement learning from scientific discussions improves model performance by over 15% compared to the base model on Genome-Bench, narrowing the gap between open-source LLMs and expert-level reasoning. To our knowledge, this is the first end-to-end pipeline for teaching LLMs to reason from scientific discussions, with promising potential for generalization across scientific domains beyond biology.

View on arXiv
@article{yin2025_2505.19501,
  title={ Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning },
  author={ Ming Yin and Yuanhao Qu and Ling Yang and Le Cong and Mengdi Wang },
  journal={arXiv preprint arXiv:2505.19501},
  year={ 2025 }
}
Comments on this paper