Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

30 May 2024

Christopher D. Manning

Papers citing "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools"

49 / 49 papers shown

Title
A Reasoning-Focused Legal Retrieval Benchmark Lucia Zheng Neel Guha Javokhir Arifov Sarah Zhang Michal Skreta Christopher D. Manning Peter Henderson Daniel E. Ho AILaw RALM ELM 85 2 0 06 May 2025
Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models Matthew Dahl AILaw ELM 45 0 0 05 May 2025
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use Justin Ho Alexandra Colby William Fisher AILaw 34 0 0 04 May 2025
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification Bibek Paudel Alexander Lyzhov Preetam Joshi Puneet Anand HILM 46 0 0 09 Apr 2025
Causal Retrieval with Semantic Consideration Hyunseo Shin Wonseok Hwang 23 0 0 07 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool Minhu Park Hongseok Oh Eunkyung Choi Wonseok Hwang AILaw RALM ELM 112 0 0 02 Apr 2025
Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification Allison Koenecke Jed Stiglitz David Mimno Matthew Wilkens AILaw ELM 77 0 0 02 Apr 2025
Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus Claas Beger Carl-Leander Henneking 37 0 0 29 Mar 2025
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration Hong Qing Yu Frank McQuade 46 1 0 14 Mar 2025
Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties Eunkyung Choi Young Jin Suh H. Park Wonseok Hwang 49 1 0 05 Mar 2025
Adaptively evaluating models with task elicitation Davis Brown Prithvi Balehannina Helen Jin Shreya Havaldar Hamed Hassani Eric Wong ALM ELM 86 0 0 03 Mar 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications Adam Kovacs Gábor Recski 37 2 0 24 Feb 2025
Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law Manuj Kant Sareh Nabi Manav Kant Roland Scharrer Megan Ma Marzieh Nabi AILaw ELM 68 0 0 24 Feb 2025
AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County Faiz Surani Mirac Suzgun Vyoma Raman Christopher D. Manning Peter Henderson Daniel E. Ho 41 0 0 12 Feb 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering Zongxi Li Y. Li Haoran Xie S. J. Qin 66 0 0 03 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems Robert Friel Masha Belyi Atindriyo Sanyal 72 17 0 17 Jan 2025
ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting Steven H. Wang Maksim Zubkov Kexin Fan Sarah Harrell Yuyang Sun Wei Chen Andreas Plesner Roger Wattenhofer AILaw 44 1 0 11 Jan 2025
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration Moe Kayali Fabian Wenz Nesime Tatbul Çağatay Demiralp 44 2 0 31 Dec 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning R. Krishnan Piyush Khanna Omesh Tickoo HILM 64 1 0 03 Dec 2024
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment Firdavs Nasriddinov Rafal Kocielnik Arushi Gupta Cherine Yang Elyssa Y. Wong Anima Anandkumar Andrew J. Hung 65 0 0 01 Dec 2024
A Social Outcomes and Priorities centered (SOP) Framework for AI policy Mohak Shah 27 0 0 12 Nov 2024
Evaluating the Accuracy of Chatbots in Financial Literature Orhan Erdem Kristi Hassett Feyzullah Egriboyun 31 0 0 11 Nov 2024
VERITAS: A Unified Approach to Reliability Evaluation Rajkumar Ramamurthy Meghana Arakkal Rajeev Oliver Molenschot James Y. Zou Nazneen Rajani HILM 31 1 0 05 Nov 2024
Responsible Retrieval Augmented Generation for Climate Decision Making from Documents Matyas Juhasz Kalyan Dutia Henry Franks Conor Delahunty Patrick Fawbert Mills Harrison Pim 29 1 0 31 Oct 2024
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models Mirac Suzgun Tayfun Gur Federico Bianchi Daniel E. Ho Thomas F. Icard Dan Jurafsky James Zou 29 1 0 28 Oct 2024
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights Odysseas S. Chlapanis D. Galanis Ion Androutsopoulos AILaw ELM 21 0 0 17 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability ZhongXiang Sun Xiaoxue Zang Kai Zheng Yang Song Jun Xu Xiao Zhang Weijie Yu Yang Song Han Li 50 6 0 15 Oct 2024
Measuring the Groundedness of Legal Question-Answering Systems Dietrich Trautmann Natalia Ostapuk Quentin Grail Adrian Alan Pol Guglielmo Bonifazi Shang Gao Martin Gajek HILM AILaw ELM 19 0 0 11 Oct 2024
Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models Yeeun Kim Young Rok Choi Eunkyung Choi Jinhwan Choi H. Park Wonseok Hwang ELM AILaw 28 0 0 11 Oct 2024
Answering Questions in Stages: Prompt Chaining for Contract QA Adam Roegiest Radha Chitta AILaw ELM 24 0 0 09 Oct 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval Pengcheng Jiang Cao Xiao Minhao Jiang Parminder Bhatia Taha A. Kass-Hout Jimeng Sun Jiawei Han RALM AI4MH 39 4 0 06 Oct 2024
Exploring Language Model Generalization in Low-Resource Extractive QA Saptarshi Sengupta Wenpeng Yin Preslav Nakov Shreya Ghosh Suhang Wang 23 0 0 27 Sep 2024
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations Abe Bohan Hou William Jurayj Nils Holzenberger Andrew Blair-Stanek Benjamin Van Durme ELM 20 0 0 16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Sacha Muller António Loison Bilel Omrani Gautier Viaud RALM ELM 26 1 0 10 Sep 2024
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications Rishi Kalra Zekun Wu Ayesha Gulley Airlie Hilliard Xin Guan Adriano Soares Koshiyama Philip C. Treleaven RALM AILaw 47 5 0 29 Aug 2024
Problem Solving Through Human-AI Preference-Based Cooperation Subhabrata Dutta Timo Kaufmann Goran Glavas Ivan Habernal Kristian Kersting Frauke Kreuter Mira Mezini Iryna Gurevych Eyke Hüllermeier Hinrich Schuetze 82 1 0 14 Aug 2024
CoverBench: A Challenging Benchmark for Complex Claim Verification Alon Jacovi Moran Ambar Eyal Ben-David Uri Shaham Amir Feder Mor Geva Dror Marcus Avi Caciularu LMTD 45 3 0 06 Aug 2024
It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human Jakub Harasta Tereza Novotná Jaromír Šavelka ELM 37 0 0 09 Jul 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Abe Bohan Hou Orion Weller Guanghui Qin Eugene Yang Dawn J Lawrie Nils Holzenberger Andrew Blair-Stanek Benjamin Van Durme AILaw ELM 55 5 0 24 Jun 2024
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales Zhepei Wei Wei-Lin Chen Yu Meng RALM 53 12 0 19 Jun 2024
Natural Language Interaction with a Household Electricity Knowledge-based Digital Twin Carolina Fortuna Vid Hanvzel Blavz Bertalanivc 38 0 0 03 Jun 2024
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost Masha Belyi Robert Friel Shuai Shao Atindriyo Sanyal HILM RALM 59 5 0 03 Jun 2024
Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights Maksym Taranukhin Sahithya Ravi Gabor Lukacs E. Milios Vered Shwartz 18 1 0 19 Mar 2024
Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models Yuzhe Zhang Yipeng Zhang Yidong Gan Lina Yao Chen Wang 31 10 0 23 Feb 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore Changmao Li Jeffrey Flanigan 92 87 0 26 Dec 2023
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak D. Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 209 178 0 20 Oct 2023
Exploring the Practicality of Generative Retrieval on Dynamic Corpora Soyoung Yoon Chaeeun Kim Hyunji Lee Joel Jang Sohee Yang Minjoon Seo 11 3 0 27 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 206 2,232 0 22 Mar 2023
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson M. Krass Lucia Zheng Neel Guha Christopher D. Manning Dan Jurafsky Daniel E. Ho AILaw ELM 129 94 0 01 Jul 2022