Abstract: Incorporating the depth (D) information for RGB images has proven the effectiveness and robustness in semantic segmentation. However, the fusion between them is still a challenge due to their meaning discrepancy, in which RGB represents the color but D depth information. In this paper, we propose a co-attention Network (CANet) to capture the fine-grained interplay between RGB�_and D�_ features. The key part in our CANet is co-attention fusion part. It includes three modules. At first, the position and channel co-attention fusion modules adaptively fuse color and depth features in spatial and channel dimension. Finally, a final fusion module integrates the outputs of the two co-attention fusion modules for forming a more representative feature. Our extensive experiments validate the effectiveness of CANet in fusing RGB and D features, achieving the state-of-the-art performance on two challenging RGB-D semantic segmentation datasets, i.e., NYUDv2, SUN-RGBD.

SlidesLive

Similar Papers

Jointly Discriminating and Frequent Visual Representation Mining
Qiannan Wang (Xidian university), Ying Zhou (Xidian University), ZhaoYan Zhu (Xidian university), Xuefeng Liang (Xidian University)*, Yu Gu (School of Artificial Intelligence, Xi'dian University)
Explaining an image classifier's decisions using generative models
Chirag Agarwal (UIC), Anh Nguyen (Auburn University)*
RF-GAN: A Light and Reconfigurable Network for Unpaired Image-to-Image Translation
Ali Koksal (Nanyang Technological University), Shijian Lu (Nanyang Technological University)*