Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

17 October 2024

Abstract

The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.

View on arXiv

@article{li2025_2410.13192,
  title={ Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models },
  author={ Jiatao Li and Xinyu Hu and Xunjian Yin and Xiaojun Wan },
  journal={arXiv preprint arXiv:2410.13192},
  year={ 2025 }
}

Comments on this paper