v1v2v3v4 (latest)

Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

17 June 2025

ArXiv (abs)PDF HTML Github (30273★)

Main:9 Pages

5 Figures

38 Tables

Appendix:36 Pages

Abstract

Negation is a fundamental linguistic phenomenon that poses ongoing challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Current benchmarks often treat negation as a minor detail within broader tasks, such as natural language inference. Consequently, there is a lack of benchmarks specifically designed to evaluate comprehension of negation. In this work, we introduce Thunder-NUBench, a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. Thunder-NUBench goes beyond merely identifying surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually curated sentence-negation pairs and a multiple-choice dataset, allowing for a comprehensive evaluation of models' understanding of negation.

View on arXiv

Comments on this paper