Citation faithfulness detection is critical for enhancing retrieval-augmented generation (RAG) systems, yet large-scale Chinese datasets for this task are scarce. Existing methods face prohibitive costs due to the need for manually annotated negative samples. To address this, we introduce the first large-scale Chinese dataset CiteCheck for citation faithfulness detection, constructed via a cost-effective approach using two-stage manual annotation. This method balances positive and negative samples while significantly reducing annotation expenses. CiteCheck comprises training and test splits. Experiments demonstrate that: (1) the test samples are highly challenging, with even state-of-the-art LLMs failing to achieve high accuracy; and (2) training data augmented with LLM-generated negative samples enables smaller models to attain strong performance using parameter-efficient fine-tuning. CiteCheck provides a robust foundation for advancing citation faithfulness detection in Chinese RAG systems. The dataset is publicly available to facilitate research.
View on arXiv@article{xu2025_2502.10881, title={ CiteCheck: Towards Accurate Citation Faithfulness Detection }, author={ Ziyao Xu and Shaohang Wei and Zhuoheng Han and Jing Jin and Zhe Yang and Xiaoguang Li and Haochen Tan and Zhijiang Guo and Houfeng Wang }, journal={arXiv preprint arXiv:2502.10881}, year={ 2025 } }