v1v2 (latest)

Quantization-Based Score Calibration for Few-Shot Keyword Spotting with Dynamic Time Warping in Noisy Environments

17 October 2025

Kevin Wilkinghoff

Alessia Cornaggia-Urrigshardt

Zheng-Hua Tan

ArXiv (abs)PDF HTML

Main:3 Pages

3 Figures

Bibliography:2 Pages

2 Tables

Abstract

Detecting occurrences of keywords with keyword spotting (KWS) systems requires thresholding continuous detection scores. Selecting appropriate thresholds is a non-trivial task, typically relying on optimizing performance on a validation dataset. However, such greedy threshold selection often leads to suboptimal performance on unseen data, particularly in varying or noisy acoustic environments or few-shot settings. In this work, we investigate detection threshold estimation for template-based open-set few-shot KWS using dynamic time warping on noisy speech data. To mitigate the performance degradation caused by suboptimal thresholds, we propose a score calibration approach that operates at the embedding level by quantizing learned representations and applying quantization error-based normalization prior to DTW-based scoring and thresholding. Experiments on KWS-DailyTalk with simulated high frequency radio channels show that the proposed calibration approach simplifies the selection of robust detection thresholds and significantly improves the resulting performance.

View on arXiv

Comments on this paper