Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth

Automatic machine learning-based detectors of various psychological and social phenomena (e.g., emotion, stress, engagement) have great potential to advance basic science. However, when a detector is trained to approximate an existing measurement tool (e.g., a questionnaire, observation protocol), then care must be taken when interpreting measurements collected using since they are one step further removed from the underlying construct. We examine how the accuracy of , as quantified by the correlation of 's outputs with the ground-truth construct , impacts the estimated correlation between (e.g., stress) and some other phenomenon (e.g., academic performance). In particular: (1) We show that if the true correlation between and is , then the expected sample correlation, over all vectors whose correlation with is , is . (2) We derive a formula for the probability that the sample correlation (over subjects) using is positive given that the true correlation is negative (and vice-versa); this probability can be substantial (around ) for values of and that have been used in recent affective computing studies. %We also show that this probability decreases monotonically in and in . (3) With the goal to reduce the variance of correlations estimated by an automatic detector, we show that training multiple neural networks using different training architectures and hyperparameters for the same detection task provides only limited ``coverage'' of .
View on arXiv