Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?

5 October 2025

Nelvin Tan

Main:4 Pages

2 Figures

Bibliography:1 Pages

2 Tables

Appendix:3 Pages

Abstract

Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual classification tasks, motivating the need to explain the LLMs' decisions. Motivated by practical constrains where LLMs are black-boxed and LLM calls are expensive, we study how incorporating counterfactuals into LLM reasoning can affect the LLM's ability to identify the top words that have contributed to its classification decision. To this end, we introduce a framework called the decision changing rate that helps us quantify the importance of the top words in classification. Our experimental results show that using counterfactuals can be helpful.

View on arXiv

Comments on this paper