Abstract:
In extremely complex scenarios, existing camouflaged object detection methods still face significant challenges, which are specifically manifested in insufficient localization accuracy of camouflage objects at varying scales and blurred boundary problems in predicted segmentation maps. To address the above shortcomings, this paper innovatively proposes an encoder-decoder detection network integrating wavelet decomposition and attention guidance, aiming to achieve accurate recognition and high-precision segmentation of camouflage objects. Specifically, in the encoding stage, the Swin-Transformer backbone is used for multi-level encoding, and the receptive field module is utilized for feature optimization processing. Subsequently, a frequency decomposition and aggregation module is proposed to perform frequency-domain enhancement on multi-level features, so as to assist the model in locating targets and enhancing spatial details. Then, a cascaded feature fusion module is adopted to progressively fuse local and global information. Finally, an attention-guided edge-enhanced decoder is used to conduct complementary fusion of semantic features and edge features, thereby generating more accurate predicted maps of camouflage objects. Extensive experiment results demonstrate that compared with 21 existing deep learning-based algorithms, the proposed approach has obvious advantages on three public camouflage object detection datasets.