Explicit Abstention Knobs for Predictable Reliability in Video Question Answering

31 December 2025

Jorge Ortiz

ArXiv (abs)PDF HTML Github

Main:21 Pages

14 Figures

Bibliography:4 Pages

12 Tables

Abstract

High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f

View on arXiv

Comments on this paper