485

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Main:8 Pages
3 Figures
Bibliography:3 Pages
27 Tables
Appendix:24 Pages
Abstract

In retrieval-augmented generation systems, the integration of self-generated documents (SGDs) alongside retrieved content has emerged as a promising strategy for enhancing the performance of large language model. However, previous research primarily focuses on optimizing the use of SGDs, with the inherent properties of SGDs remaining underexplored. Therefore, this paper conducts a comprehensive analysis of different types of SGDs and experiments on various knowledge-intensive tasks. We develop a taxonomy of SGDs grounded in Systemic Functional Linguistics (SFL) to compare the influence of different SGD categories. Our findings offer key insights into what kinds of SGDs most effectively contribute to improving LLM's performance. The results and further fusion methods based on SGD categories also provide practical guidelines for taking better advantage of SGDs to achieve significant advancements in knowledge-driven QA tasks with RAG.

View on arXiv
Comments on this paper