ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00896
  4. Cited By
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large
  Language Models

DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models

1 March 2024
Kedi Chen
Qin Chen
Jie Zhou
Yishen He
Liang He
    HILM
ArXivPDFHTML

Papers citing "DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models"

12 / 12 papers shown
Title
Mitigating Dialogue Hallucination for Large Vision Language Models via
  Adversarial Instruction Tuning
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning
Dongmin Park
Zhaofang Qian
Guangxing Han
Ser-Nam Lim
MLLM
28
0
0
15 Mar 2024
Navigating Hallucinations for Reasoning of Unintentional Activities
Navigating Hallucinations for Reasoning of Unintentional Activities
Shresth Grover
Vibhav Vineet
Y. S. Rawat
LRM
36
1
0
29 Feb 2024
Hallucination Detection and Hallucination Mitigation: An Investigation
Hallucination Detection and Hallucination Mitigation: An Investigation
Junliang Luo
Tianyu Li
Di Wu
Michael R. M. Jenkin
Steve Liu
Gregory Dudek
HILM
LLMAG
36
20
0
16 Jan 2024
Red Teaming for Large Language Models At Scale: Tackling Hallucinations
  on Mathematics Tasks
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik
Karol Dobiczek
Michal Teodor Okoñ
Konrad Skublicki
Philip Lippmann
Jie-jin Yang
LRM
ReLM
16
3
0
30 Dec 2023
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Tianhang Zhang
Lin Qiu
Qipeng Guo
Cheng Deng
Yue Zhang
Zheng-Wei Zhang
Cheng Zhou
Xinbing Wang
Luoyi Fu
HILM
54
46
0
22 Nov 2023
Are Large Language Models Reliable Judges? A Study on the Factuality
  Evaluation Capabilities of LLMs
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs
Xue-Yong Fu
Md Tahmid Rahman Laskar
Cheng-Hsiung Chen
TN ShashiBhushan
HILM
ELM
59
17
0
01 Nov 2023
Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark
Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark
Jian-Dong Lin
Yaxin Fan
Feng Jiang
Xiaomin Chu
Peifeng Li
32
4
0
02 May 2023
The Internal State of an LLM Knows When It's Lying
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
210
297
0
26 Apr 2023
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
185
79
0
30 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
212
140
0
18 Apr 2021
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Kai Sun
Seungwhan Moon
Paul A. Crook
Stephen Roller
Becka Silvert
Bing-Quan Liu
Zhiguang Wang
Honglei Liu
Eunjoon Cho
Claire Cardie
59
66
0
24 Oct 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
393
2,216
0
03 Sep 2019
1