Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (7): 1371-1383.doi: 10.11947/j.AGCS.2024.20230074

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

LAG-MANet model for remote sensing image scene classification

Wei WANG(), Wei ZHENG, Xin WANG()   

  1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
  • Received:2023-03-16 Published:2024-08-12
  • Contact: Xin WANG E-mail:wangwei@csust.edu.cn;wangxin@csust.edu.cn
  • About author:WANG Wei (1974—), male, PhD, professor, PhD supervisor, majors in computer vision and pattern recognition. E-mail: wangwei@csust.edu.cn
  • Supported by:
    The National Key Basic Research Program Project(6240XXX0206);The National Science Innovation Special Zone Project(2019XXX00701);Key Research and Development Project of Hunan Province(2020SK2134);Natural Science Foundation of Hunan Province(2019JJ80105);Science and Technology Plan Project of Changsha(kq2004071)

Abstract:

In the process of remote sensing image classification, both local and global information are crucial. At present, the methods for remote sensing image scene classification mainly include convolutional neural networks (CNN) and Transformers. While CNN has advantages in extracting local information, it has certain limitations in extracting global information. Compared with CNN, Transformer performs well in extracting global information, but has high computational complexity. To improve the performance of scene classification for remote sensing images while reducing complexity, a pure convolutional network called LAG-MANet is designed. This network focuses on both local and global features, taking into account multiple scales of features. Firstly, after inputting the pre-processed remote sensing images, multi-scale features are extracted by a multi-branch dilated convolution block (MBDConv). Then it enters four stages of the network in turn, and in each stage, local and global features are extracted and fused by different branches of the parallel dual-domain feature fusion block (P2DF). Finally, the classification labels are pooled by global average before being output by the fully connected layer. The classification accuracy of LAG-MANet is 97.76% on the WHU-RS19 dataset, 97.04% on the SIRI-WHU dataset and 97.18% on the RSSCN7 dataset. The experimental results on three challenging public remote sensing datasets show that the LAG-MANet proposed in this paper is superior.

Key words: remote sensing image, scene classification, CNN, LAG-MANet

CLC Number: