267

Recognizing Micro-Expression in Video Clip with Adaptive Key-Frame Mining

Abstract

As a spontaneous expression of emotion on face, micro-expression is receiving increasing attention from the affective computing community. Whist better recognition accuracy is achieved by various deep learning (DL) techniques, one characteristic of micro-expression has been not fully exploited. That is, such facial movement is transient and sparsely localized through time. Therefore, the representation learned from a full video clip is usually redundant. On the other hand, methods utilizing the single apex frame require manual annotations and sacrifice the temporal dynamics. To simultaneously localize and recognize such fleeting facial movements, we propose a novel end-to-end deep learning architecture, referred to as Adaptive Key-frame Mining Network (AKMNet). Operating on the raw video clip of micro-expression, AKMNet is able to learn discriminative spatio-temporal representation by combining spatial features of self-learned local key frames and their global-temporal dynamics. Empirical and theoretical evaluations show advantages of the proposed approach with improved performance.

View on arXiv
Comments on this paper