ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23843
18
0

Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks

28 May 2025
Wenhan Dong
Tianyi Hu
Jingyi Zheng
Zhen Sun
Yuemeng Zhao
Yule Liu
Xinlei He
Xinyi Huang
    LRMELM
ArXiv (abs)PDFHTML
Main:7 Pages
1 Figures
Bibliography:2 Pages
2 Tables
Abstract

Multi-round incomplete information tasks are crucial for evaluating the lateral thinking capabilities of large language models (LLMs). Currently, research primarily relies on multiple benchmarks and automated evaluation metrics to assess these abilities. However, our study reveals novel insights into the limitations of existing methods, as they often yield misleading results that fail to uncover key issues, such as shortcut-taking behaviors, rigid patterns, and premature task termination. These issues obscure the true reasoning capabilities of LLMs and undermine the reliability of evaluations. To address these limitations, we propose a refined set of evaluation standards, including inspection of reasoning paths, diversified assessment metrics, and comparative analyses with human performance.

View on arXiv
@article{dong2025_2505.23843,
  title={ Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks },
  author={ Wenhan Dong and Tianyi Hu and Jingyi Zheng and Zhen Sun and Yuemeng Zhao and Yule Liu and Xinlei He and Xinyi Huang },
  journal={arXiv preprint arXiv:2505.23843},
  year={ 2025 }
}
Comments on this paper