A remote sensing image classification procedure based on multilevel attention fusion U-Net

doi:10.11947/j.AGCS.2020.20190407

Abstract

Abstract: Traditional convolutional neural network almost cannot obtain satisfactory classification results of the remote sensing images due to the large differences in the size and spectral characteristics of the objects. In addition, the complex background environment will also bring interference to the classification. Aiming at this problem, the multilevel attention fusion U-Net (MAFU-Net) is presented. To enhance the correlations between different pixels and channels, the attention module is applied to extract and process semantic information at different levels, which further improves the classification performance of the network under complex background. In order to verify the effect of the proposed network in the classification of remote sensing images, the experiments were carried out on Vaihingen dataset of ISPRS, Beijing and Henan dataset of GF 2, respectively, and several different semantic segmentation networks are used for comparison. The experimental results show that the proposed network has fewer parameters and lower computational complexity, but can achieve higher classification accuracy in the least time, which means the network is highly practical.In addition, the feature visualization was fully utilized to analyze the classification performance of MAFU-Net and other networks, and the results also show that most deep learning network models are difficult to be deduced according to the accurate mathematical principles. It is also difficult to explain why a particular network fails in a particular dataset. Therefore, the further study or more advanced visualization and quantification criteria are required to analyze and evaluate specific deep learning models and network performance, then the more advanced model structure can be designed.

Key words: object classification, remote sensing image, attention mechanism, U-shape convolutional neural network, semantic segmentation

CLC Number:

P237

LI Daoji, GUO Haitao, LU Jun, ZHAO Chuan, LIN Yuzhun, YU Donghang. A remote sensing image classification procedure based on multilevel attention fusion U-Net[J]. Acta Geodaetica et Cartographica Sinica, 2020, 49(8): 1051-1064.

References

[1] VOLPI M, TUIA D, BOVOLO F, et al. Supervised change detection in VHR images using contextual information and support vector machines[J]. International Journal of Applied Earth Observation and Geoinformation, 2013, 20:77-85.
[2] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48(4):448-459. DOI:10.11947/j.AGCS.2019.20180206. JI Shunping, WEI Shiqing. Building extraction via convolutional neural networks from an open remote sensing building dataset[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48(4):448-459. DOI:10.11947/j.AGCS.2019.20180206.
[3] ABBAS A W, MINALLH N, AHMAD N, et al. K-means and ISODATA clustering algorithms for landcover classification using remote sensing[J]. Sindh University Research Journal-SURJ (Science Series), 2016, 48(2):315-318.
[4] 郭军. 引入上下文信息的可见光遥感图像目标检测与识别方法研究[D]. 长沙:国防科学技术大学, 2014. GUO Jun. Research on object detection and recognition for visible remote-sensing images by introducing context[D]. Changsha:National University of Defense Technology, 2014.
[5] 李爽, 丁圣彦, 钱乐祥. 决策树分类法及其在土地覆盖分类中的应用[J]. 遥感技术与应用, 2002, 17(1):6-11. LI Shuang, DING Shengyan, QIAN Yuexiang. The decision tree classification and its application research in land cover[J]. Remote Sensing Technology and Application, 2002, 17(1):6-11.
[6] 伍广明, 陈奇, SHIBASAKI R, 等. 基于U型卷积神经网络的航空影像建筑物检测[J]. 测绘学报, 2018, 47(6):864-872. DOI:10.11947/j.AGCS.2018.20170651. WU Guangming, CHEN Qi, SHIBASAKI R, et al. High-precision building detection from aerial imagery using a U-net like convolutional architecture[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(6):864-872. DOI:10.11947/j.AGCS.2018.20170651.
[7] AICH S, VAN DER KAMP W, STAVNESS I. Semantic binary segmentation using convolutional networks without decoders[J]. arXiv preprint arXiv:1805.00138, 2018.
[8] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[9] MARMANIS D, WEGNER J D, GALLIANI S, et al. Semantic segmentation of aerial images with an ensemble of CNNs[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016, Ⅲ-3:473-480.
[10] MAGGIORI E, TARABALKA Y, CHARPIAT G, et al. Convolutional neural networks for large-scale remote-sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(2):645-657.
[11] RONNEBERGER O, FISCHER P, BROX T U. U-Net:Convolutional networks for biomedical image segmentation[C]//Proceedings of Medical Image Computing and Computing and Computer-Assisted Intervention, 2015, 9351:234-241.
[12] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[13] NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015:1520-1528.
[14] LIN Guosheng, MILAN A, SHEN Chunhua, et al. Refinenet:Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI:IEEE, 2017:1925-1934.
[15] ZHANG Hang, DANA K, SHI Jianping, et al. Context encoding for semantic segmentation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:7151-7160.
[16] YANG Maoke, YU Kun, ZHANG Chi, et al. DenseASPP for semantic segmentation in street scenes[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:3684-3692.
[17] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4):834-848.
[18] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.
[19] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2016:3213-3223.
[20] CAESAR H, UIJLINGS J, FERRARI V. Coco-stuff:Thing and stuff classes in context[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:1209-1218.
[21] ABRAHAM N, KHAN N M. A novel focal tversky loss function with improved attention U-Net for lesion segmentation[C]//Proceedings of 2019 IEEE International Symposium on Biomedical Imaging (ISBI 2019). Venice, Italy:IEEE, 2019:683-687.
[22] WANG Xiaolong, GIRSHICK R, GUPTA A, et al. Non-local neural networks[J]. arXiv:1711.07971v3, 2018:7794-7803.
[23] HUANG Zilong, WANG Xinggang, HUANG Lichao, et al. Ccnet:Criss-cross attention for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South):IEEE, 2019:603-612.
[24] LI Xia, ZHONG Zhisheng, WU Jianlong, et al. Expectation-maximization attention networks for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South):IEEE, 2019:9167-9176.
[25] FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA:IEEE, 2019:3146-3154.
[26] MOU Lichao, HUA Yuansheng, ZHU Xiaoxiang. A relation-augmented fully convolutional network for semantic segmentation in aerial scenes[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision And Pattern Recognition. Long Beach, CA:IEEE, 2019:12416-12425.
[27] TONG Xinyi, XIA Guisong, LU Qikai, et al. Learning transferable deep models for land-use classification with high-resolution remote sensing images[J]. arXiv preprint arXiv:1807.05713, 2018.
[28] GONZALEZ R C, WOODS R E, EDDINS S L. Digital image processing using MATLAB[M]. Upper Saddle River:Prentice Hall, 2004.
[29] BERMAN M, TRIKI A R, BLASCHKO M B. The lovász-softmax loss:a tractable surrogate for the optimization of the intersection-over-union measure in neural networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:4413-4421.
[30] LIN Wenjie, LI Yu, ZHAO Quanhua. High-resolution remote sensing image segmentation using mining spanning tree tessellation and RHMRF-FCM algorithm[J]. Journal of Geodesy and Geoinformation Science, 2020, 3(1):52-63.DOI:10.11947/j.JGGS.2020.0106.