Salient Object Detection by Fusing Local and Global Contexts

Abstract

Benefiting from the powerful discriminative feature learning capability of convolutional neural networks (CNNs), deep learning techniques have achieved remarkable performance improvement for the task of salient object detection (SOD) in recent years. However, most existing deep SOD models do not fully exploit informative contextual features, which often leads to suboptimal detection performance in the presence of a cluttered background. This paper presents a context-aware attention module that detects salient objects by simultaneously constructing connections between each image pixel and its local and global contextual pixels. Specifically, each pixel and its neighbors bidirectionally exchange semantic information by computing their correlation coefficients, and this process aggregates contextual attention features both locally and globally. In addition, an attention-guided hierarchical network architecture is designed to capture fine-grained spatial details by transmitting contextual information from deeper to shallower network layers in a top-down manner. Extensive experiments on six public SOD datasets show that our proposed model demonstrates superior SOD performance against most of the current state-of-the-art models under different evaluation metrics.

Publication
In IEEE Transactions on Multimedia (TMM), 2020