Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

21 April 2025

Jiajun Shen

Tong Zhou

Yubo Chen

Delai Qiu

Shengping Liu

Kang Liu

Jun Zhao

HILM

RALM

ArXiv PDF HTML

Abstract

While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.

View on arXiv

@article{shen2025_2504.14856,
  title={ Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation },
  author={ Jiajun Shen and Tong Zhou and Yubo Chen and Delai Qiu and Shengping Liu and Kang Liu and Jun Zhao },
  journal={arXiv preprint arXiv:2504.14856},
  year={ 2025 }
}

Comments on this paper