BSN++: Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Proposal Generation
Haisheng Su (SenseTime Group Limited)*
Keywords: Video Analysis and Event Recognition
                Abstract:
                Generating  human action proposals in untrimmed videos is an important yet challenging  task with wide applications. Current methods often suffer from the noisy  boundary locations and the inferior quality of confidence scores used for  proposal retrieving. In this paper, we present BSN++, a new framework which  exploits complementary boundary regressor and relation modeling for temporal  proposal generation. First, we propose a novel boundary regressor based on  the complementary characteristics of both starting and ending boundary  classifiers. Specifically, we utilize the U-shaped architecture with nested  skip connections to capture rich contexts and introduce bi-directional  boundary matching mechanism to improve boundary precision. Second, to account  for the proposal-proposal relations ignored in previous methods, we devise a  proposal relation block to which includes two self-attention modules from the  aspects of position and channel. Furthermore, we find that there inevitably  exists data imbalanced problems in the positive/negative proposals and  temporal durations, which harm the model performance on tail distributions.  To relieve this issue, we introduce the scale-balanced re-sampling strategy.  Extensive experiments are conducted on two popular benchmarks:  ActivityNet-1.3 and THUMOS14, which demonstrate that BSN++ achieves the  state-of-the-art performance.