LAG-MANet model for remote sensing image scene classification

doi:10.11947/j.AGCS.2024.20230074

Abstract

Abstract:

In the process of remote sensing image classification, both local and global information are crucial. At present, the methods for remote sensing image scene classification mainly include convolutional neural networks (CNN) and Transformers. While CNN has advantages in extracting local information, it has certain limitations in extracting global information. Compared with CNN, Transformer performs well in extracting global information, but has high computational complexity. To improve the performance of scene classification for remote sensing images while reducing complexity, a pure convolutional network called LAG-MANet is designed. This network focuses on both local and global features, taking into account multiple scales of features. Firstly, after inputting the pre-processed remote sensing images, multi-scale features are extracted by a multi-branch dilated convolution block (MBDConv). Then it enters four stages of the network in turn, and in each stage, local and global features are extracted and fused by different branches of the parallel dual-domain feature fusion block (P2DF). Finally, the classification labels are pooled by global average before being output by the fully connected layer. The classification accuracy of LAG-MANet is 97.76% on the WHU-RS19 dataset, 97.04% on the SIRI-WHU dataset and 97.18% on the RSSCN7 dataset. The experimental results on three challenging public remote sensing datasets show that the LAG-MANet proposed in this paper is superior.

Key words: remote sensing image, scene classification, CNN, LAG-MANet

CLC Number:

P237

Wei WANG, Wei ZHENG, Xin WANG. LAG-MANet model for remote sensing image scene classification[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(7): 1371-1383.

Figures/Tables 15

Fig.1

Fig.2

Tab.7

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

References 34

[1]	龚健雅, 张觅, 胡翔云, 等. 智能遥感深度学习框架与模型设计[J]. 测绘学报, 2022, 51(4):475-487. DOI: 10.11947/j.AGCS.2022.20220027.
	GONG Jianya, ZHANG Mi, HU Xiangyun, et al. The design of deep learning framework and model for intelligent remote sensing[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(4):475-487. DOI: 10.11947/j.AGCS.2022.20220027.
[2]	吴琼, 葛大庆, 于峻川, 等. 广域滑坡灾害隐患InSAR显著性形变区深度学习识别技术[J]. 测绘学报, 2022, 51(10):2046-2055. DOI: 10.11947/j.AGCS.2022.20220303.
	WU Qiong, GE Daqing, YU Junchuan, et al. Deep learning identification technology of InSAR significant deformation zone of potential landslide hazard at large scale[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(10):2046-2055. DOI: 10.11947/j.AGCS.2022.20220303.
[3]	CHENG Gong, XIE Xingxing, HAN Junwei, et al. Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13:3735-3756.
[4]	CHEN Leiyu, LI Shaobo, BAI Qiang, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote Sensing, 2021, 13(22):4712.
[5]	YUAN Yuan, FANG Jie, LU Xiaoqiang, et al. Remote sensing image scene classification using rearranged local features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(3):1779-1792.
[6]	AKODAD S, BOMBRUN L, XIA Junshi, et al. Ensemble learning approaches based on covariance pooling of CNN features for high resolution remote sensing scene classification[J]. Remote Sensing, 2020, 12(20):3292.
[7]	TONG Wei, CHEN Weitao, HAN Wei, et al. Channel-attention-based DenseNet network for remote sensing image scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13:4121-4132.
[8]	HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[9]	ZHANG Guokai, XU Weizhe, ZHAO Wei, et al. A multiscale attention network for remote sensing scene images classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14:9530-9545.
[10]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural Information processing systems, 2017, 30:6000-6010.
[11]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//Proceedings of the 9th International Conference on Learning Representations. Virtual Event: IEEE, 2020: 1-12.
[12]	BAZI Y, BASHMAL L, AL RAHHAL M M, et al. Vision transformers for remote sensing image classification[J]. Remote Sensing, 2021, 13(3):516.
[13]	王威, 邓纪伟, 王新, 等. 面向遥感图像场景分类的GLFFNet模型[J]. 测绘学报, 2023, 52(10):1693-1702. DOI: 10.11947/j.AGCS.2023.20220286.
	WANG Wei, DENG Jiwei, WANG Xin, et al. GLFFNet model for remote sensing image scene classification[J]. Acta Geodaetica et Cartographica Sinica, 2023, 52(10):1693-1702. DOI: 10.11947/j.AGCS.2023.20220286.
[14]	WANG Wei, HU Ting, WANG Xin, et al. BFRNet: bidimensional feature representation network for remote sensing images classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61:3313800.
[15]	DENG Peifang, XU Kejie, HUANG Hong. When CNNs meet vision transformer: a joint framework for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:3109061.
[16]	RAO Yongming, ZHAO Wenliang, TANG Yansong, et al. Hornet: efficient high-order spatial interactions with recursive gated convolutions[C]//Proceedings of2022 Advances in Neural Information Processing Systems. [S.l.]: IEEE, 2022.
[17]	DING Xiaohan, ZHANG Xiangyu, HAN Jungong, et al. Scaling up your kernels to 31x31: revisiting large kernel design in cnns[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11963-11975.
[18]	GUO Menghao, LU Chengze, LIU Zhengning, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(4):733-752.
[19]	WOO S, PARK J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of 2018 European conference on computer vision. Cham: Springer, 2018: 3-19.
[20]	LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
[21]	XIA G S, YANG Wen, DELON J, et al. Structural high-resolution satellite image indexing[C]//Proceedings of 2010 ISPRS TC VII Symposium. [S.l.]: ISPRS, 2010: 298-303.
[22]	ZHU Qiqi, ZHONG Yanfei, ZHAO Bei, et al. Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(6):747-751.
[23]	ZOU Qin, NI Lihao, ZHANG Tong, et al. Deep learning based feature selection for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(11):2321-2325.
[24]	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: ACM Press, 2015: 448-456.
[25]	KINGMA D P, BA J. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations. [S.l.]: IEEE, 2015: 1-13.
[26]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference for Learning Representations. [S.l.]: IEEE, 2015: 463-476.
[27]	LIU Ze, LIN Yutong, CAO Yue, et al. SwinTransformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.
[28]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[29]	TAN M, LE Q. EfficientNetV2: smaller models and faster training[C]//Proceedings of 2021 International Conference on Machine Learning. BALTIMORE: IEEE, 2021: 10096-10106.
[30]	YU Weihao, LUO Mi, ZHOU Pan, et al. MetaFormer is actually what you need for vision[C]//Proceedings of 2022 IEEE/CVF conference on computer vision and pattern recognition. New Orleans: IEEE, 2022: 10819-10829.
[31]	LI Siyuan, WANG Zedong, LIU Zicheng, et al. Efficient multi-order gated aggregation network[EB/OL]. [2022-11-07]. https://arxiv.org/abs/2211.03295.
[32]	TANG Xu, LI Mingteng, MA Jingjing, et al. EMTCAL: efficient multiscale transformer and cross-level attention learning for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:3194505.
[33]	CHEN Weitao, OUYANG Shubing, TONG Wei, et al. GCSANet: a global context spatial attention deep learning network for remote sensing scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15:1150-1162.
[34]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2):336-359.

模型	扩张率	准确率/（%）
LAG-MANet-1	×	94.43±0.31
LAG-MANet-2	[2]	96.04±0.57
LAG-MANet-3	[1，2]	96.57±0.35
LAG-MANet-4	[1，2，3]	96.32±0.22
LAG-MANet	[1，2，3，4]	97.18±0.41
LAG-MANet-5	[1，2，3，4，5]	96.18±0.24

模型	各阶段空间交互维度（n）				准确率/（%）
模型	一	二	三	四	准确率/（%）
LAG-MANet-6	2	3	3	3	95.89±0.41
LAG-MANet-7	2	3	4	4	96.25±0.30
LAG-MANet-8	2	3	4	5	96.21±0.24
LAG-MANet-9	3	3	4	4	96.43±0.67
LAG-MANet-10	3	3	4	5	96.47±0.58
LAG-MANet	3	3	3	3	97.18±0.41

模型	RAC	SAM	MSI	CAM	RPFF	准确率/（%）
LAG-MANet-11	×	√	√	√	√	95.03±0.38
LAG-MANet-12	√	×	√	√	√	96.61±0.39
LAG-MANet-13	√	√	×	√	√	95.96±0.57
LAG-MANet-14	√	√	√	×	√	96.18±0.29
LAG-MANet-15	√	√	√	√	×	92.43±0.53
LAG-MANet-16	×	×	√	√	√	93.86±0.55
LAG-MANet-17	√	√	×	×	√	95.93±0.31
LAG-MANet-18	√	×	√	×	√	96.32±0.18
LAG-MANet-19	×	√	×	√	√	95.43±0.18
LAG-MANet-20	×	√	√	×	√	94.79±0.48
LAG-MANet-21	√	×	×	√	√	96.14±0.40
LAG-MANet	√	√	√	√	√	97.18±0.41

模型	准确率	精确率	召回率	特异性	F₁值
ResNet50^[28]	97.45±0.32	97.71±0.37	97.42±0.34	99.87±0.02	97.43±0.35
VGG16^[26]	92.25±1.59	93.05±1.48	92.31±1.59	99.57±0.08	92.27±1.60
EfficientNetV2^[29]	97.04±0.20	97.21±0.18	97.06±0.21	99.85±0.01	97.03±0.20
ConvNext^[20]	90.82±1.07	91.70±1.00	90.92±0.95	99.50±0.06	90.79±1.03
ViT^[11]	81.73±1.04	83.46±1.57	81.82±1.00	98.99±0.06	81.65±1.06
SwinTransformer^[27]	92.96±1.09	93.63±0.99	93.08±1.07	99.62±0.06	93.02±1.09
PoolFormer^[30]	93.17±0.89	93.73±0.81	93.20±0.86	99.63±0.05	93.18±0.85
Hornet^[16]	88.88±0.59	89.84±1.10	89.03±0.59	99.39±0.04	88.85±0.80
MogaNet^[31]	96.74±0.25	97.03±0.27	96.70±0.26	99.83±0.01	96.71±0.25
VAN^[18]	96.53±0.82	96.79±0.76	96.53±0.84	99.82±0.05	96.50±0.83
EMTCAL^[32]	88.06±0.79	88.93±0.76	87.97±0.83	99.35±0.04	87.89±0.83
GCSANet^[33]	96.10±0.41	96.34±0.41	96.07±0.39	99.79±0.02	96.04±0.42
LAG-MANet-Split	97.86±0.50	98.04±0.45	97.87±0.51	99.89±0.03	97.86±0.50
LAG-MANet	97.76±0.25	97.93±0.20	97.72±0.27	99.88±0.01	97.72±0.27

模型	参数量（×10⁶）	计算量（×10⁶）
ResNet50^[28]	23.52	4 131.71
VGG16^[26]	134.29	15 466.20
EfficientNetV2^[29]	20.19	2 897.32
ConvNext^[20]	27.80	4 454.77
ViT^[11]	85.65	16 862.87
SwinTransformer^[27]	27.50	4 371.13
PoolFormer^[30]	20.84	3 393.76
Hornet^[16]	21.86	3 967.88
MogaNet^[31]	24.79	4 947.57
VAN^[18]	13.35	2 505.09
EMTCAL^[32]	27.8	4 233.93
GCSANet^[33]	14.16	5 677.51
LAG-MANet-Split	5.01	1 603.94
LAG-MANet	12.51	3 648.77