Abstract: Most deep learning object detectors are based on the anchor mechanism and resort to the Intersection over Union (IoU) between predefined anchor boxes and ground truth boxes to evaluate the matching quality between anchors and objects. In this paper, we question this use of IoU and propose a new anchor matching criterion guided, during the training phase, by the optimization of both the localization and the classification tasks: the predictions related to one task are used to dynamically assign sample anchors and improve the model on the other task, and vice versa. Despite the simplicity of the proposed method, our experiments with different state-of-the-art deep learning architectures on PASCAL VOC and MS COCO datasets demonstrate the effectiveness and generality of our Mutual Guidance strategy.

SlidesLive

Similar Papers

Unpaired Multimodal Facial Expression Recognition
Bin Xia (University of Science and Technology of China), Shangfei Wang (University of Science and Technology of China)*
Pre-training without Natural Images
Hirokatsu Kataoka (National Institute of Advanced Industrial Science and Technology (AIST))*, Kazushige Okayasu (National Institute of Advanced Industrial Science and Technology (AIST)), Asato Matsumoto (National Institute of Advanced Industrial Science and Technology (AIST)), Eisuke Yamagata (Tokyo Institute of Technology), Ryosuke Yamada (Tokyo Denki University), Nakamasa Inoue (Tokyo Institute of Technology), Akio Nakamura (Tokyo Denki University (TDU)), Yutaka Satoh (National Institute of Advanced Industrial Science and Technology (AIST))
Watch, read and lookup: learning to spot signs from multiple supervisors
Liliane Momeni (University of Oxford), Gul Varol (University of Oxford)*, Samuel Albanie (University of Oxford), Triantafyllos Afouras (University of Oxford), Andrew Zisserman (University of Oxford)