Bidirectional Pyramid Networks for Semantic Segmentation
Dong Nie (UNC)*, Jia Xue (Rutgers University), Xiaofeng Ren (Alibaba group)
Keywords: Segmentation and Grouping
Abstract:
Semantic segmentation is a fundamental problem in com-puter vision that has attracted a lot of attention. Recent eorts havebeen devoted to network architecture innovations for ecient semanticsegmentation that can run in real-time for autonomous driving and otherapplications. Information ow between scales is crucial because accuratesegmentation needs both large context and ne detail. However, most ex-isting approaches still rely on pretrained backbone models (e.g. ResNeton ImageNet). In this work, we propose to open up the backbone and de-sign a simple yet eective multiscale network architecture, BidirectionalPyramid Network (BPNet). BPNet takes the shape of a pyramid: infor-mation ows from bottom (high-resolution, small receptive eld) to top(low-resolution, large receptive eld), and from top to bottom, in a sys-tematic manner, at every step of the processing. More importantly, fusionneeds to be ecient; this is done through an add-and-multiply modulewith learned weights. We also apply a unary-pairwise attention mecha-nism to balance position sensitivity and context aggregation. Auxiliaryloss is applied at multiple steps of the pyramid bottom. The resultingnetwork achieves high accuracy with eciency, without the need of pre-training. On the standard Cityscapes dataset, we achieve test mIoU 76:3with 5:1M parameters and 36 fps (on Nvidia 2080 Ti), competitive withthe state of the time real-time models. Meanwhile, our design is generaland can be used to build heavier networks: a ResNet-101 equivalent ver-sion of BPNet achieves mIoU 81.9 on Cityscapes, competitive with thebest published results. We further demonstrate the exibility of BPNeton a prostate MRI segmentation task, achieving the state of the art with a45x speed-up.