Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection
Erli Ouyang (Fudan University)*, Li Zhang (University of Oxford), Mohan Chen (Fudan University), Anurag Arnab (University of Oxford), Yanwei Fu (Fudan University)
Keywords: 3D Computer Vision
Abstract:
Visual-based 3D detection is drawing a lot of attention recently. Despite the best efforts from the computer vision researchers visual-based 3D detection remains a largely unsolved problem. This is primarily due to the lack of accurate depth perception provided by LiDAR sensors. Previous works struggle to fuse 3D spatial information and the RGB image effectively. In this paper, we propose a novel monocular 3D detection framework to address this problem. Specifically, we propose to primary contributions: (i) We design an Adaptive Depth-guided Instance Normalization layer to leverage depth features to guide RGB features for high quality estimation of 3D properties. (ii) We introduce a Dynamic Depth Transformation module to better recover accurate depth according to semantic context learning and thus facilitate the removal of depth ambiguities that exist in the RGB image. Experiments show that our approach achieves state-of-the-art on KITTI 3D detection benchmark among current monocular 3D detection works.