Inference to the Best Explanation in Large Language Models

16 February 2024

Abstract

While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77\% accuracy ( $\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ( $\approx+17\%$ ) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

View on arXiv

@article{dalal2025_2402.10767,
  title={ Inference to the Best Explanation in Large Language Models },
  author={ Dhairya Dalal and Marco Valentino and André Freitas and Paul Buitelaar },
  journal={arXiv preprint arXiv:2402.10767},
  year={ 2025 }
}

Comments on this paper