54
0

LegalBench.PT: A Benchmark for Portuguese Law

Abstract

The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we presentthis http URL, the first comprehensive legal benchmark covering key areas of Portuguese law. To developthis http URL, we first collect long-form questions and answers from real law exams, and then use GPT-4o to convert them into multiple-choice, true/false, and matching formats. Once generated, the questions are filtered and processed to improve the quality of the dataset. To ensure accuracy and relevance, we validate our approach by having a legal professional review a sample of the generated questions. Although the questions are synthetically generated, we show that their basis in human-created exams and our rigorous filtering and processing methods applied result in a reliable benchmark for assessing LLMs' legal knowledge and reasoning abilities. Finally, we evaluate the performance of leading LLMs onthis http URLand investigate potential biases in GPT-4o's responses. We also assess the performance of Portuguese lawyers on a sample of questions to establish a baseline for model comparison and validate the benchmark.

View on arXiv
@article{canaverde2025_2502.16357,
  title={ LegalBench.PT: A Benchmark for Portuguese Law },
  author={ Beatriz Canaverde and Telmo Pessoa Pires and Leonor Melo Ribeiro and André F. T. Martins },
  journal={arXiv preprint arXiv:2502.16357},
  year={ 2025 }
}
Comments on this paper