339
v1v2 (latest)

Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties

Juheon Kang
Won Hur
Hun Park
Wonseok Hwang
Main:7 Pages
13 Figures
Bibliography:4 Pages
17 Tables
Appendix:14 Pages
Abstract

How capable are large language models (LLMs) in the domain of taxation? Although numerous studies have explored the legal domain, research dedicated to taxation remains scarce. Moreover, the datasets used in these studies are either simplified, failing to reflect the real-world complexities, or not released as open-source. To address this gap, we introduce PLAT, a new benchmark designed to assess the ability of LLMs to predict the legitimacy of additional tax penalties. PLAT comprises 300 examples: (1) 100 binary-choice questions, (2) 100 multiple-choice questions, and (3) 100 essay-type questions, all derived from 100 Korean court precedents. PLAT is constructed to evaluate not only LLMs' understanding of tax law but also their performance in legal cases that require complex reasoning beyond straightforward application of statutes. Our systematic experiments with multiple LLMs reveal that (1) their baseline capabilities are limited, especially in cases involving conflicting issues that require a comprehensive understanding (not only of the statutes but also of the taxpayer's circumstances), and (2) LLMs struggle particularly with the "AC" stages of "IRAC" even for advanced reasoning models like o3, which actively employ inference-time scaling. The dataset is publicly available at:this https URL

View on arXiv
Comments on this paper