44

K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

Shuhe Li
Chenxu Guo
Jiachen Lian
Cheol Jun Cho
Wenshuo Zhao
Xuanru Zhou
Dingkun Zhou
Sam Wang
Grace Wang
Jingze Yang
Jingyi Xu
Ruohan Bao
Elise Brenner
Brandon In
Francesca Pei
Maria Luisa Gorno-Tempini
Gopala Anumanchipalli
Main:4 Pages
1 Figures
Bibliography:1 Pages
3 Tables
Abstract

Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specific errors while remaining fully interpretable. Kids-WFST attains 1.39% phoneme error on MyST and 8.61% on Multitudes--absolute gains of 10.47 and 7.06 points over a greedy-search decoder. These high-fidelity transcripts power an LLM that grades verbal skills, milestones, reading, and comprehension, aligning with human proctors and supplying tongue-and-lip visualizations plus targeted advice. The results show that precise phoneme recognition cements a complete diagnostic-feedback loop, paving the way for scalable, clinician-ready language assessment.

View on arXiv
Comments on this paper