Abstract: Semantic segmentation is a fundamental problem in com-puter vision that has attracted a lot of attention. Recent eorts havebeen devoted to network architecture innovations for ecient semanticsegmentation that can run in real-time for autonomous driving and otherapplications. Information ow between scales is crucial because accuratesegmentation needs both large context and ne detail. However, most ex-isting approaches still rely on pretrained backbone models (e.g. ResNeton ImageNet). In this work, we propose to open up the backbone and de-sign a simple yet eective multiscale network architecture, BidirectionalPyramid Network (BPNet). BPNet takes the shape of a pyramid: infor-mation ows from bottom (high-resolution, small receptive eld) to top(low-resolution, large receptive eld), and from top to bottom, in a sys-tematic manner, at every step of the processing. More importantly, fusionneeds to be ecient; this is done through an add-and-multiply modulewith learned weights. We also apply a unary-pairwise attention mecha-nism to balance position sensitivity and context aggregation. Auxiliaryloss is applied at multiple steps of the pyramid bottom. The resultingnetwork achieves high accuracy with eciency, without the need of pre-training. On the standard Cityscapes dataset, we achieve test mIoU 76:3with 5:1M parameters and 36 fps (on Nvidia 2080 Ti), competitive withthe state of the time real-time models. Meanwhile, our design is generaland can be used to build heavier networks: a ResNet-101 equivalent ver-sion of BPNet achieves mIoU 81.9 on Cityscapes, competitive with thebest published results. We further demonstrate the exibility of BPNeton a prostate MRI segmentation task, achieving the state of the art with a45x speed-up.

SlidesLive

Similar Papers

Backbone Based Feature Enhancement for Object Detection
Haoqin Ji (Shenzhen University), Weizeng Lu (Shenzhen University), Linlin Shen (Shenzhen University)*
Best Buddies Registration for Point Clouds
Amnon Drory (Tel-Aviv University)*, Tal Shomer (Tel-Aviv University), Shai Avidan (Tel Aviv University), Raja Giryes (Tel Aviv University)
EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion
Chia-Yuan Chang (National Taiwan University)*, Shuo-En Chang (National Taiwan University), Pei-Yung Hsiao (National University of Kaohsiung), Li-Chen Fu (National Taiwan University)