Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

Simon Jenni (Universität Bern)*, Paolo Favaro (University of Bern)

Keywords: Face, Pose, Action, and Gesture

Abstract: Current state of the art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on costly large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets. To drive such models in the pre-training step towards supporting 3D pose estimation, we introduce a novel self-supervised feature learning task designed to focus on the 3D structure in an image. We exploit images extracted from videos captured with a multi-view camera system. The task is to classify whether two images depict two views of the same scene up to a rigid transformation. In a multi-view data set, where objects deform in a non-rigid manner, a rigid transformation occurs only between two views taken at the exact same time, i.e., when they are synchronized.We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.

SlidesLive

Similar Papers

Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses
Miao Liao (Baidu)*, Sibo Zhang (Baidu), Peng Wang (Baidu USA LLC.), Hao Zhu (Nanjing University), Xinxin Zuo (University of Kentucky), Ruigang Yang (University of Kentucky, USA)
Learning Global Pose Features in Graph Convolutional Networks for 3D Human Pose Estimation
Kenkun Liu ( University of Illinois at Chicago), Zhiming Zou (University of Illinois at Chicago), Wei Tang (University of Illinois at Chicago)*
Anatomy and Geometry Constrained One-Stage Framework for 3D Human Pose Estimation
Xin Cao (Shanghai JiaoTong University), Xu Zhao (Shanghai Jiao Tong University)*