Acta Geodaetica et Cartographica Sinica ›› 2023, Vol. 52 ›› Issue (4): 624-637.doi: 10.11947/j.AGCS.2023.20210659

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

Attention-guided feature fusion and joint learning for remote sensing image scene classification

YU Donghang1,2, XU Qing2, ZHAO Chuan3, GUO Haitao2, LU Jun2, LIN Yuzhun2, LIU Xiangyun2   

  1. 1. Naval Research Institute, Beijing 100070, China;
    2. Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China;
    3. Rocket Force Command College, Wuhan 430012, China
  • Received:2021-12-06 Revised:2022-11-07 Published:2023-05-05
  • Supported by:
    The National Natural Science Foundation of China (No. 42001338)

Abstract: Aiming at the difficulties of high-precision remote sensing image classification caused by scale variations, inter-class similarity and intra-class difference, a method with attention-guided feature fusion and joint learning is proposed for remote sensing image scene classification to make full use of multi-scale features extracted from the images. First, the deep convolutional neural network is used to extract three levels of feature maps from the images. Then, the residual attention mechanism is designed to enhance the semantic information and suppress the noise information of the feature maps. Finally, global average pooling is used to obtain the global information of the feature maps and to construct the feature vectors. Then the three levels of feature vectors are fused by connection.The three levels of feature vectors and the fusion result are classified in independent fully connected layers, respectively.During the training process, the joint loss is calculated to optimize the model's parameters. And multi-classifier decision-level fusion is adopted to improve the robustness of prediction. Experimental results on the UC Merced, AID and NWPU-RESISC45 datasets show that the proposed method can significantly improve the discrimination on similar scenes and scenes with intra-class difference. And compared with the similar method using multi-scale features, the overall accuracies are improved by 0.84%, 4.04% and 4.43%, respectively.

Key words: remote sensing image, scene classification, convolutional neural network, feature fusion, joint learning, attention mechanism

CLC Number: