HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs

25 February 2024

Papers citing "HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs"

4 / 4 papers shown

Title
Adaptive Stress Testing Black-Box LLM Planners Neeloy Chakraborty John Pohovey Melkior Ornik Katherine Driggs-Campbell 23 0 0 08 May 2025
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization Catarina G. Belem Pouya Pezeskhpour Hayate Iso Seiji Maekawa Nikita Bhutani Estevam R. Hruschka HILM 65 1 0 17 Oct 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018