Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth

19 December 2018

Abstract

Automatic machine learning-based detectors of various psychological and social phenomena (e.g., emotion, stress, engagement) have great potential to advance basic science. However, when a detector $d$ is trained to approximate an existing measurement tool (e.g., a questionnaire, observation protocol), then care must be taken when interpreting measurements collected using $d$ since they are one step further removed from the underlying construct. We examine how the accuracy of $d$ , as quantified by the correlation $q$ of $d$ 's outputs with the ground-truth construct $U$ , impacts the estimated correlation between $U$ (e.g., stress) and some other phenomenon $V$ (e.g., academic performance). In particular: (1) We show that if the true correlation between $U$ and $V$ is $r$ , then the expected sample correlation, over all vectors $\mathcal{T}^n$ whose correlation with $U$ is $q$ , is $qr$ . (2) We derive a formula for the probability that the sample correlation (over $n$ subjects) using $d$ is positive given that the true correlation is negative (and vice-versa); this probability can be substantial (around $20-30\%$ ) for values of $n$ and $q$ that have been used in recent affective computing studies. %We also show that this probability decreases monotonically in $n$ and in $q$ . (3) With the goal to reduce the variance of correlations estimated by an automatic detector, we show that training multiple neural networks $d^{(1)},\ldots,d^{(m)}$ using different training architectures and hyperparameters for the same detection task provides only limited ``coverage'' of $\mathcal{T}^n$ .

View on arXiv

Comments on this paper