Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20362
Cited By
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
30 May 2024
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools"
49 / 49 papers shown
Title
A Reasoning-Focused Legal Retrieval Benchmark
Lucia Zheng
Neel Guha
Javokhir Arifov
Sarah Zhang
Michal Skreta
Christopher D. Manning
Peter Henderson
Daniel E. Ho
AILaw
RALM
ELM
85
2
0
06 May 2025
Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models
Matthew Dahl
AILaw
ELM
45
0
0
05 May 2025
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use
Justin Ho
Alexandra Colby
William Fisher
AILaw
34
0
0
04 May 2025
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
Bibek Paudel
Alexander Lyzhov
Preetam Joshi
Puneet Anand
HILM
46
0
0
09 Apr 2025
Causal Retrieval with Semantic Consideration
Hyunseo Shin
Wonseok Hwang
23
0
0
07 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
112
0
0
02 Apr 2025
Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification
Allison Koenecke
Jed Stiglitz
David Mimno
Matthew Wilkens
AILaw
ELM
77
0
0
02 Apr 2025
Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus
Claas Beger
Carl-Leander Henneking
37
0
0
29 Mar 2025
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
Hong Qing Yu
Frank McQuade
46
1
0
14 Mar 2025
Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties
Eunkyung Choi
Young Jin Suh
H. Park
Wonseok Hwang
49
1
0
05 Mar 2025
Adaptively evaluating models with task elicitation
Davis Brown
Prithvi Balehannina
Helen Jin
Shreya Havaldar
Hamed Hassani
Eric Wong
ALM
ELM
86
0
0
03 Mar 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
37
2
0
24 Feb 2025
Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law
Manuj Kant
Sareh Nabi
Manav Kant
Roland Scharrer
Megan Ma
Marzieh Nabi
AILaw
ELM
68
0
0
24 Feb 2025
AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County
Faiz Surani
Mirac Suzgun
Vyoma Raman
Christopher D. Manning
Peter Henderson
Daniel E. Ho
41
0
0
12 Feb 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
Zongxi Li
Y. Li
Haoran Xie
S. J. Qin
66
0
0
03 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
72
17
0
17 Jan 2025
ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
Steven H. Wang
Maksim Zubkov
Kexin Fan
Sarah Harrell
Yuyang Sun
Wei Chen
Andreas Plesner
Roger Wattenhofer
AILaw
44
1
0
11 Jan 2025
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Moe Kayali
Fabian Wenz
Nesime Tatbul
Çağatay Demiralp
44
2
0
31 Dec 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning
R. Krishnan
Piyush Khanna
Omesh Tickoo
HILM
64
1
0
03 Dec 2024
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment
Firdavs Nasriddinov
Rafal Kocielnik
Arushi Gupta
Cherine Yang
Elyssa Y. Wong
Anima Anandkumar
Andrew J. Hung
65
0
0
01 Dec 2024
A Social Outcomes and Priorities centered (SOP) Framework for AI policy
Mohak Shah
27
0
0
12 Nov 2024
Evaluating the Accuracy of Chatbots in Financial Literature
Orhan Erdem
Kristi Hassett
Feyzullah Egriboyun
31
0
0
11 Nov 2024
VERITAS: A Unified Approach to Reliability Evaluation
Rajkumar Ramamurthy
Meghana Arakkal Rajeev
Oliver Molenschot
James Y. Zou
Nazneen Rajani
HILM
31
1
0
05 Nov 2024
Responsible Retrieval Augmented Generation for Climate Decision Making from Documents
Matyas Juhasz
Kalyan Dutia
Henry Franks
Conor Delahunty
Patrick Fawbert Mills
Harrison Pim
29
1
0
31 Oct 2024
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Mirac Suzgun
Tayfun Gur
Federico Bianchi
Daniel E. Ho
Thomas F. Icard
Dan Jurafsky
James Zou
29
1
0
28 Oct 2024
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights
Odysseas S. Chlapanis
D. Galanis
Ion Androutsopoulos
AILaw
ELM
21
0
0
17 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ZhongXiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
50
6
0
15 Oct 2024
Measuring the Groundedness of Legal Question-Answering Systems
Dietrich Trautmann
Natalia Ostapuk
Quentin Grail
Adrian Alan Pol
Guglielmo Bonifazi
Shang Gao
Martin Gajek
HILM
AILaw
ELM
19
0
0
11 Oct 2024
Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models
Yeeun Kim
Young Rok Choi
Eunkyung Choi
Jinhwan Choi
H. Park
Wonseok Hwang
ELM
AILaw
28
0
0
11 Oct 2024
Answering Questions in Stages: Prompt Chaining for Contract QA
Adam Roegiest
Radha Chitta
AILaw
ELM
24
0
0
09 Oct 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Pengcheng Jiang
Cao Xiao
Minhao Jiang
Parminder Bhatia
Taha A. Kass-Hout
Jimeng Sun
Jiawei Han
RALM
AI4MH
39
4
0
06 Oct 2024
Exploring Language Model Generalization in Low-Resource Extractive QA
Saptarshi Sengupta
Wenpeng Yin
Preslav Nakov
Shreya Ghosh
Suhang Wang
23
0
0
27 Sep 2024
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations
Abe Bohan Hou
William Jurayj
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
ELM
20
0
0
16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
26
1
0
10 Sep 2024
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
Rishi Kalra
Zekun Wu
Ayesha Gulley
Airlie Hilliard
Xin Guan
Adriano Soares Koshiyama
Philip C. Treleaven
RALM
AILaw
47
5
0
29 Aug 2024
Problem Solving Through Human-AI Preference-Based Cooperation
Subhabrata Dutta
Timo Kaufmann
Goran Glavas
Ivan Habernal
Kristian Kersting
Frauke Kreuter
Mira Mezini
Iryna Gurevych
Eyke Hüllermeier
Hinrich Schuetze
82
1
0
14 Aug 2024
CoverBench: A Challenging Benchmark for Complex Claim Verification
Alon Jacovi
Moran Ambar
Eyal Ben-David
Uri Shaham
Amir Feder
Mor Geva
Dror Marcus
Avi Caciularu
LMTD
45
3
0
06 Aug 2024
It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human
Jakub Harasta
Tereza Novotná
Jaromír Šavelka
ELM
37
0
0
09 Jul 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou
Orion Weller
Guanghui Qin
Eugene Yang
Dawn J Lawrie
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
AILaw
ELM
55
5
0
24 Jun 2024
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Zhepei Wei
Wei-Lin Chen
Yu Meng
RALM
53
12
0
19 Jun 2024
Natural Language Interaction with a Household Electricity Knowledge-based Digital Twin
Carolina Fortuna
Vid Hanvzel
Blavz Bertalanivc
38
0
0
03 Jun 2024
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Masha Belyi
Robert Friel
Shuai Shao
Atindriyo Sanyal
HILM
RALM
59
5
0
03 Jun 2024
Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights
Maksym Taranukhin
Sahithya Ravi
Gabor Lukacs
E. Milios
Vered Shwartz
18
1
0
19 Mar 2024
Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models
Yuzhe Zhang
Yipeng Zhang
Yidong Gan
Lina Yao
Chen Wang
31
10
0
23 Feb 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
92
87
0
26 Dec 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Soyoung Yoon
Chaeeun Kim
Hyunji Lee
Joel Jang
Sohee Yang
Minjoon Seo
11
3
0
27 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
1