Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use

This paper presents a domain-specific implementation of Retrieval-Augmented Generation (RAG) tailored to the Fair Use Doctrine in U.S. copyright law. Motivated by the increasing prevalence of DMCA takedowns and the lack of accessible legal support for content creators, we propose a structured approach that combines semantic search with legal knowledge graphs and court citation networks to improve retrieval quality and reasoning reliability. Our prototype models legal precedents at the statutory factor level (e.g., purpose, nature, amount, market effect) and incorporates citation-weighted graph representations to prioritize doctrinally authoritative sources. We use Chain-of-Thought reasoning and interleaved retrieval steps to better emulate legal reasoning. Preliminary testing suggests this method improves doctrinal relevance in the retrieval process, laying groundwork for future evaluation and deployment of LLM-based legal assistance tools.
View on arXiv@article{ho2025_2505.02164, title={ Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use }, author={ Justin Ho and Alexandra Colby and William Fisher }, journal={arXiv preprint arXiv:2505.02164}, year={ 2025 } }