9
0

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Abstract

Academic writing requires both coherent text generation and precise citation of relevant literature. Although recent Retrieval-Augmented Generation (RAG) systems have significantly improved factual accuracy in general-purpose text generation, their capacity to adequately support professional academic writing remains limited. In this work, we introduce ScholarCopilot, a unified framework designed to enhance existing large language models for generating professional academic articles with accurate and contextually relevant citations. ScholarCopilot dynamically determines when to retrieve scholarly references by generating a retrieval token [RET], and then utilizes its representation to look up relevant citations from a database. The retrieved references are fed into the model to augment the generation process. We jointly optimize both the generation and citation tasks within a single framework to increase efficiency. Trained on 500K papers from arXiv, our model achieves a top-1 retrieval accuracy of 40.1% on our evaluation dataset, outperforming baselines such as E5-Mistral-7B-Instruct (15.0%) and BM25 (9.8%). On a dataset of 1,000 academic writing samples, ScholarCopilot scores 16.2/25 in generation quality (measured across relevance, coherence, academic rigor, completeness, and innovation), surpassing models with 10x more parameters such as Qwen-2.5-72B-Instruct (15.8/25). Human studies also confirm ScholarCopilot's superior performance in citation recall, writing efficiency, and overall user experience, confirming the effectiveness of our approach.

View on arXiv
@article{wang2025_2504.00824,
  title={ ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations },
  author={ Yubo Wang and Xueguang Ma and Ping Nie and Huaye Zeng and Zhiheng Lyu and Yuxuan Zhang and Benjamin Schneider and Yi Lu and Xiang Yue and Wenhu Chen },
  journal={arXiv preprint arXiv:2504.00824},
  year={ 2025 }
}
Comments on this paper