Cross-Domain Facial Expression Recognition via Contrastive Warm up and Complexity-aware Self-Training

Abstract

Unsupervised cross-domain Facial Expression Recognition (FER) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain. Existing methods strive to reduce the discrepancy between source and target domain, but cannot effectively explore the abundant semantic information of the target domain due to the absence of target labels. To this end, we propose a novel framework via Contrastive Warm up and Complexity-aware Self-Training (namely CWCST), which facilitates source knowledge transfer and target semantic learning jointly. Specifically, we formulate a contrastive warm up strategy via features, momentum features, and learnable category centers to concurrently learn discriminative representations and narrow the domain gap, which benefits domain adaptation by generating more accurate target pseudo labels. Moreover, to deal with the inevitable noise in pseudo labels, we develop complexity-aware self-training with a label selection module based on prediction entropy, which iteratively generates pseudo labels and adaptively chooses the reliable ones for training, ultimately yielding effective target semantics exploration. Furthermore, by jointly using the two mentioned components, our framework enables to effectively utilize the source knowledge and target semantic information by source-target co- training. In addition, our framework can be easily incorporated into other baselines with consistent performance improvements. Extensive experimental results on seven databases show the superior performance of the proposed method against various baselines.

Publication
In IEEE Transactions on Image Processing (TIP), 2023.