Abstract:
In this paper, we propose a novel training method to improve the precision of facial landmark localization. When a facial landmark localization method is applied to a facial video, the detected landmarks occasionally jitter, whereas the face apparently does not move. We hypothesize that there are two causes that induce the unstable detection: (1) small changes in input images and (2) inconsistent annotations. Corresponding to the causes, we propose (1) two loss terms to make a model robust to changes in the input images and (2) self-distillation training to reduce the effect of the annotation noise. We show that our method can improve the precision of facial landmark localization by reducing the variance using public facial landmark datasets, 300-W and 300-VW. We also show that our method can reduce jitter of predicted landmarks when applied to a video.