测绘学报 ›› 2024, Vol. 53 ›› Issue (7): 1371-1383.doi: 10.11947/j.AGCS.2024.20230074

• 摄影测量与遥感 • 上一篇    下一篇

面向遥感图像场景分类的LAG-MANet模型

王威(), 郑薇, 王新()   

  1. 长沙理工大学计算机与通信工程学院,湖南 长沙 410114
  • 收稿日期:2023-03-16 发布日期:2024-08-12
  • 通讯作者: 王新 E-mail:wangwei@csust.edu.cn;wangxin@csust.edu.cn
  • 作者简介:王威(1974—),男,博士,教授,博士生导师,研究方向为计算机视觉和模式识别。E-mail:wangwei@csust.edu.cn
  • 基金资助:
    国家重点行动计划(6240XXX0206);国防科技创新特区项目(2019XXX00701);湖南省重点研究开发(2020SK2134);湖南省自然科学基金(2019JJ80105);长沙市科技计划(kq2004071)

LAG-MANet model for remote sensing image scene classification

Wei WANG(), Wei ZHENG, Xin WANG()   

  1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
  • Received:2023-03-16 Published:2024-08-12
  • Contact: Xin WANG E-mail:wangwei@csust.edu.cn;wangxin@csust.edu.cn
  • About author:WANG Wei (1974—), male, PhD, professor, PhD supervisor, majors in computer vision and pattern recognition. E-mail: wangwei@csust.edu.cn
  • Supported by:
    The National Key Basic Research Program Project(6240XXX0206);The National Science Innovation Special Zone Project(2019XXX00701);Key Research and Development Project of Hunan Province(2020SK2134);Natural Science Foundation of Hunan Province(2019JJ80105);Science and Technology Plan Project of Changsha(kq2004071)

摘要:

遥感图像分类过程中,局部信息与全局信息至关重要。目前,遥感图像分类的方法主要包括卷积神经网络(CNN)及Transformer。CNN在局部信息提取方面具有优势,但在全局信息提取方面有一定的局限性。相比之下,Transformer在全局信息提取方面表现出色,但计算复杂度高。为提高遥感图像场景分类性能,降低复杂度,设计了LAG-MANet纯卷积网络。该网络既关注局部特征,又关注全局特征,并且考虑了多尺度特征。输入图像被预处理后,首先采用多分支扩张卷积模块(MBDConv)提取多尺度特征;然后依次进入网络的4个阶段,在每个阶段采用并行双域特征融合模块(P2DF)分支路提取局部、全局特征并进行融合;最后先经过全局平均池化、再经过全连接层输出分类标签。LAG-MANet在WHU-RS19数据集、SIRI-WHU数据集及RSSCN7数据集上的分类准确率分别为97.76%、97.04%、97.18%。试验结果表明,在3个具有挑战性的公开遥感数据集上,LAG-MANet更具有优越性。

关键词: 遥感图像, 场景分类, CNN, LAG-MANet

Abstract:

In the process of remote sensing image classification, both local and global information are crucial. At present, the methods for remote sensing image scene classification mainly include convolutional neural networks (CNN) and Transformers. While CNN has advantages in extracting local information, it has certain limitations in extracting global information. Compared with CNN, Transformer performs well in extracting global information, but has high computational complexity. To improve the performance of scene classification for remote sensing images while reducing complexity, a pure convolutional network called LAG-MANet is designed. This network focuses on both local and global features, taking into account multiple scales of features. Firstly, after inputting the pre-processed remote sensing images, multi-scale features are extracted by a multi-branch dilated convolution block (MBDConv). Then it enters four stages of the network in turn, and in each stage, local and global features are extracted and fused by different branches of the parallel dual-domain feature fusion block (P2DF). Finally, the classification labels are pooled by global average before being output by the fully connected layer. The classification accuracy of LAG-MANet is 97.76% on the WHU-RS19 dataset, 97.04% on the SIRI-WHU dataset and 97.18% on the RSSCN7 dataset. The experimental results on three challenging public remote sensing datasets show that the LAG-MANet proposed in this paper is superior.

Key words: remote sensing image, scene classification, CNN, LAG-MANet

中图分类号: