Hallucination in Language Models

HILM

Dedicated to studies primarily investigating the causes, implications, and solutions for the phenomenon where language models generate plausible but incorrect or nonsensical outputs.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown

All papers

50 / 1,363 papers shown

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation Siqi Sun Ben Peng Wu Mali Jin Peizhen Bai Hanpei Zhang Xingyi Song HILM RALM 2 0 0 13 Mar 2026
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs Junqi Yang Yuecong Min Jie Zhang Shiguang Shan Xilin Chen MLLM HILM 12 0 0 12 Mar 2026
Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models Eric Yocam Varghese Vaidyan Gurcan Comert Paris Kalathas Yong Wang Judith L. Mwakalonge HILM 5 0 0 10 Mar 2026
Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents Abhinaba Basu HILM 6 0 0 09 Mar 2026
How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms JV Roig HILM 9 0 0 09 Mar 2026
Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing Arash Marioriyad Ali Nouri Mohammad Hossein Rohban Mahdieh Soleymani Baghshah LLMAG HILM 11 0 0 07 Mar 2026
Lyapunov Probes for Hallucination Detection in Large Foundation Models Bozhi Luan Gen Li Yalan Qin Jifeng Guo Yun Zhou Faguo Wu Hongwei Zheng Wenjun Wu Zhaoxin Fan HILM 20 0 0 06 Mar 2026
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Junjie Li Xinrui Guo Yuhao Wu Roy Ka-Wei Lee Hongzhi Li Yutao Xie HILM 20 0 0 06 Mar 2026
HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models Shize Liang Hongzhi Wang HILM 6 0 0 06 Mar 2026
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality Yukun Huang Leonardo F. R. Ribeiro Momchil Hardalov Bhuwan Dhingra Markus Dreyer Venkatesh Saligrama HILM ALM 28 0 0 06 Mar 2026
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation Helena Casademunt Bartosz Cywiński Khoi Tran Arya Jakkli Samuel Marks Neel Nanda HILM 39 0 0 05 Mar 2026
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval Artem Vazhentsev Maria Marina Daniil Moskovskiy Sergey Pletenev Mikhail Seleznyov ... Elena Tutubalina Vasily Konovalov Irina Nikishina Alexander Panchenko Viktor Moskvoretskii KELM HILM 55 0 0 05 Mar 2026
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health Zixin Xiong Ziteng Wang Haotian Fan Xinjie Zhang Wenxuan Wang HILM AI4MH 36 0 0 03 Mar 2026
Quantifying Conversational Reliability of Large Language Models under Multi-Turn Interaction Jiyoon Myung HILM 24 0 0 02 Mar 2026
BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages Jason Lucas Matt Murtagh-White Adaku Uchendu Ali Al-Lawati Michiharu Yamashita Dominik Macko Ivan Srba Robert Moro Dongwon Lee HILM 24 0 0 28 Feb 2026
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era Zhengqing Yuan Kaiwen Shi Zheyuan Zhang Lichao Sun Nitesh V. Chawla Yanfang Ye HILM 43 0 0 26 Feb 2026
CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery Mengze Hong Di Jiang Chen Jason Zhang Zichang Guo Yawen Li Jun Chen Shaobo Cui Zhiyang Su HILM 21 0 0 26 Feb 2026
Probing for Knowledge Attribution in Large Language Models Ivo Brink Alexander Boer Dennis Ulmer HILM KELM 68 0 0 26 Feb 2026
PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring Xiting Liu Yuetong Liu Yitong Zhang Jia Li Shi-Min Hu HILM 12 0 0 24 Feb 2026
What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance William Watson Nicole Cho Sumitra Ganesh Manuela Veloso HILM 61 0 0 23 Feb 2026
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge Alex Robertson Huizhi Liang Mahbub Gani Rohit Kumar Srijith Rajamohan HILM 39 0 0 23 Feb 2026
The Truthfulness Spectrum Hypothesis Zhuofan Josh Ying Shauli Ravfogel Nikolaus Kriegeskorte Peter Hase HILM 36 0 0 23 Feb 2026
Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness Yegor Denisov-Blanch Joshua Kazdan Jessica Chudnovsky Rylan Schaeffer Sheng Guan Soji Adeshina Sanmi Koyejo HILM 4 0 0 20 Feb 2026
Detecting Contextual Hallucinations in LLMs with Frequency-Aware Attention Siya Qi Yudong Chen Runcong Zhao Qinglin Zhu Zhanghao Hu Wei Liu Yulan He Zheng Yuan Lin Gui HILM 22 0 0 20 Feb 2026
SourceBench: Can AI Answers Reference Quality Web Sources? Hexi Jin Stephen Liu Yuheng Li Simran Malik Yiying Zhang RALM HILM ELM 50 0 0 18 Feb 2026
BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization Ahmed Rafid Rumman Adib Fariya Ahmed Ajwad Abrar Mohammed Saidul Islam HILM 54 0 0 18 Feb 2026
Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination Xiangyan Chen Yujian Gan Matthew Purver HILM 21 0 0 17 Feb 2026
TruthStance: An Annotated Dataset of Conversations on Truth Social Fathima Ameen Danielle Brown Manusha Malgareddy Amanul Haque HILM 32 0 0 16 Feb 2026
Disentangling Deception and Hallucination Failures in LLMs Haolang Lu Hongrui Peng WeiYe Fu Guoshun Nan Xinye Cao Xingrui Li Hongcan Guo Kun Wang HILM 74 0 0 16 Feb 2026
Probing the Limits of the Lie Detector Approach to LLM Deception Tom-Felix Berger HILM 6 0 0 16 Feb 2026
From Fluent to Verifiable: Claim-Level Auditability for Deep Research Agents Razeen A Rasheed Somnath Banerjee Animesh Mukherjee Rima Hazra HILM 45 0 0 14 Feb 2026
The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models Heimo Müller Dominik Steiger Markus Plass Andreas Holzinger HILM 4 0 0 13 Feb 2026
Multimodal Fact-Level Attribution for Verifiable Reasoning David Wan Han Wang Ziyang Wang Elias Stengel-Eskin Hyunji Lee Mohit Bansal HILM LRM 18 0 0 12 Feb 2026
Quantifying Hallucinations in Language Language Models on Medical Textbooks Brandon C. Colelough Davis Bartels Dina Demner-Fushman HILM 13 0 0 12 Feb 2026
RSHallu: Dual-Mode Hallucination Evaluation for Remote-Sensing Multimodal Large Language Models with Domain-Tailored Mitigation Zihui Zhou Yong Feng Yanying Chen Guofan Duan Zhenxi Song Mingliang Zhou Weijia Jia HILM 65 0 0 11 Feb 2026
The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods Arpit Singh Gautam Kailash Talreja Saurabh Jha DiffM HILM 24 0 0 11 Feb 2026
What do people want to fact-check? Bijean Ghafouri Dorsaf Sallami Luca Luceri Taylor Lynn Curtis Jean-Francois Godbout Emilio Ferrara Reihaneh Rabbany HAI OffRL HILM 49 0 0 11 Feb 2026
Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement Koduvayur Subbalakshmi Sabbir Hossain Ujjal Venkata Krishna Teja Mangichetty Nastaran Jamalipour Soofi HILM 39 0 0 10 Feb 2026
FactSim: Fact-Checking for Opinion Summarization Leandro Anghinoni Jorge Sanchez HILM 11 0 0 09 Feb 2026
Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice Savan Doshi HILM ELM 41 0 0 07 Feb 2026
How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations MZ Naser HILM LRM 31 0 0 07 Feb 2026
From Out-of-Distribution Detection to Hallucination Detection: A Geometric View Litian Liu Reza Pourreza Yubing Jian Yao Qin Roland Memisevic OODD HILM LRM 27 0 0 06 Feb 2026
GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models Zuyao Xu Yuqi Qiu Lu Sun FaSheng Miao Fubin Wu ... Feng Zhang Rui Luo Xinran Liu Yingxian Li Jiaji Liu HILM 80 0 0 06 Feb 2026
Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs Samir Abdaljalil Parichit Sharma Erchin Serpedin Hasan Kurban HILM VLM 55 0 0 06 Feb 2026
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models Shuo Nie Hexuan Deng Chao Wang Ruiyu Fang Xuebo Liu Shuangyong Song Yu Li Min Zhang Xuelong Li ReLM HILM LRM 103 0 0 05 Feb 2026
Semantic Self-Distillation for Language Model Uncertainty Edward Phillips Sean Wu Boyan Gao David A. Clifton HILM 48 0 0 04 Feb 2026
Reliable and Responsible Foundation Models: A Comprehensive Survey Xinyu Yang Junlin Han Rishi Bommasani Jinqi Luo Wenjie Qu ... Rene Vidal Filippos Kokkinos Mohit Bansal Beidi Chen Huaxiu Yao HILM LRM 21 0 0 04 Feb 2026
FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding Yingli Shen Wen Lai Jie Zhou Xueren Zhang Yudong Wang Kangyang Luo Shuo Wang Ge Gao Alexander Fraser Maosong Sun HILM 44 0 0 03 Feb 2026
Chance-Constrained Inference for Hallucination Risk Control in Large Language Models Sreenivasan Mohandas HILM LRM 57 0 0 02 Feb 2026
PretrainRL: Alleviating Factuality Hallucination of Large Language Models at the Beginning Langming Liu Kangtao Lv Haibin Chen Weidong Zhang Yejing Wang ... Xin Tong Yujin Yuan Yongwei Wang Wenbo Su Bo Zheng HILM 27 0 0 02 Feb 2026

Loading #Papers per Month with "HILM"

Past speakers

Name (-)

Top Contributors

Name (-)

Top Organizations at ResearchTrend.AI

Name (-)

Social Events

Date	Location	Event
No social events available