ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.03970
52
2

A Reasoning-Focused Legal Retrieval Benchmark

6 May 2025
Lucia Zheng
Neel Guha
Javokhir Arifov
Sarah Zhang
Michal Skreta
Christopher D. Manning
Peter Henderson
Daniel E. Ho
    AILaw
    RALM
    ELM
ArXivPDFHTML
Abstract

As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs ("RAG" systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were produced through annotation processes which resemble legal research. We describe the construction of these benchmarks and the performance of existing retriever pipelines. Our results suggest that legal RAG remains a challenging application, thus motivating future research.

View on arXiv
@article{zheng2025_2505.03970,
  title={ A Reasoning-Focused Legal Retrieval Benchmark },
  author={ Lucia Zheng and Neel Guha and Javokhir Arifov and Sarah Zhang and Michal Skreta and Christopher D. Manning and Peter Henderson and Daniel E. Ho },
  journal={arXiv preprint arXiv:2505.03970},
  year={ 2025 }
}
Comments on this paper