Optimizing Filter Size in Convolutional Neural Networks for Facial
Action Unit Recognition
- CVBM
Recognizing facial action units (AUs) during spontaneous facial displays is a challenging problem. Most recently, CNNs have shown promise for facial AU recognition, where predefined and fixed convolution filter sizes are employed. In order to achieve the best performance, the optimal filter size is often empirically found by conducting extensive experimental validation. Such a training process suffers from expensive training cost, especially as the network becomes deeper. In addition, AUs activated by different facial muscles produce facial appearance changes at different scales and thus prefer different filter sizes. This paper proposes a novel Optimized Filter Size CNN (OFS-CNN), where the filter sizes and weights of all convolutional layers are learned simultaneously from the training data along with learning convolution filters. Specifically, the filter size is defined as a continuous variable, which is optimized by minimizing the training loss. Experimental results on four AU-coded databases have shown that the proposed OFS-CNN outperforms traditional CNNs with fixed filter sizes and achieves state-of-the-art recognition performance for AU recognition. Furthermore, the OFS-CNN also beats traditional CNNs using the best filter size obtained by exhaustive search and is capable of estimating optimal filter size for varying image resolution.
View on arXiv