联合稳健跨域映射和渐进语义基准修正的零样本遥感影像场景分类

doi:10.11947/j.AGCS.2020.20200139

摘要/Abstract

摘要： 零样本影像分类技术旨在通过学习数据集的部分类别（可见类），获得识别在训练阶段未出现类别（不可见类）的能力。该技术在遥感大数据时代具有重要现实意义。目前，遥感领域的零样本场景分类方法对于映射后的语义空间优化关注很少，导致已有方法的整体分类性能较差。基于这一考虑，本文提出了一种基于稳健跨域映射和渐进语义基准修正的零样本遥感影像场景分类方法。在训练的有监督学习模块，基于可见类的类别语义向量和场景影像样本，实现深度特征提取器学习和视觉空间到语义空间的稳健映射。在训练的无监督学习阶段，基于全体类别的类别语义向量和不可见类遥感影像样本，分别通过协同表示学习和k近邻算法来渐进修正不可见类类别的语义向量，从而缓解可见类语义空间与不可见类语义空间的漂移问题和自编码跨域映射模型映射后不可见类语义空间与协同表示后不可见类语义空间的偏移问题。在测试阶段，基于学习所得的深度特征提取器、自编码跨域映射模型和修正后的不见类语义向量，实现对不可见类遥感影像场景的分类。本文整合多个已有公开的遥感影像场景数据集，组建了一个新的遥感影像场景数据集，在此数据集上进行试验。试验结果表明本文提出的算法在多种不同的可见类与不可见类的划分情况下都明显优于已有公开零样本分类方法。

关键词: 零样本学习, 遥感影像场景分类, 自编码跨域映射, 协同表示学习, 自然语言模型

Abstract: Zero-shot classification technology aims to acquire the ability to identify categories that do not appear in the training stage (unseen classes) by learning some categories of the data set (seen classes), which has important practical significance in the era of remote sensing big data. Until now, the zero-shot classification methods in remote sensing field pay little attention to the semantic space optimization after mapping, which results in poor classification performance. Based on this consideration, this paper proposed a zero shot remote sensing image scene classification method based on cross-domain mapping with auto-encoder and collaborative representation learning. In the supervised learning module, based on the class semantic vector of seen class and the scene image sample, the depth feature extractor learning and robust mapping from visual space to semantic space are realized. In the unsupervised learning stage, based on the class semantic vectors of all classes and the unseen remote sensing image samples, collaborative representation learning and k-nearest neighbor algorithm are used to modify the semantic vectors of unseen classes, so as to alleviate the problem of the shift of seen class semantic space and unseen class semantic space one after another and unseen after self coding cross domain mapping model mapping the shift of class semantic space and unseen class semantic space after collaborative representation. In the testing phase, based on the depth feature extractor, self coding cross domain mapping model and modified unseen class semantic vector, the classification of unseen class remote sensing image scene can be realized. We integrate a number of open remote sensing image scene data sets and build a new remote sensing image scene data set, experiments were conducted using this dataset The experimental results show that the algorithm proposed in this paper were significantly better than the existing zero shot classification method in the case of a variety of seen and unseen classes.

Key words: zero-shot learning, remote sensing image scene classification, cross-domain mapping with auto-encoder, collaborative representation learning, natural language processing

中图分类号:

P237

李彦胜, 孔德宇, 张永军, 季铮, 肖锐. 联合稳健跨域映射和渐进语义基准修正的零样本遥感影像场景分类[J]. 测绘学报, 2020, 49(12): 1564-1574.

LI Yansheng, KONG Deyu, ZHANG Yongjun, JI Zheng, XIAO Rui. Zero-shot remote sensing image scene classification based on robust cross-domain mapping and gradual refinement of semantic space[J]. Acta Geodaetica et Cartographica Sinica, 2020, 49(12): 1564-1574.

参考文献

[1] 李德仁, 张良培, 夏桂松. 遥感大数据自动分析与数据挖掘[J]. 测绘学报, 2014, 43(12):1211-1216. DOI:10.13485/j.cnki.11-2089.2014.0187. LI Deren, ZHANG Liangpei, XIA Guisong. Automatic analysis and mining of remote sensing big data[J]. Acta Geodaetica et Cartographica Sinica, 2014, 43(12):1211-1216. DOI:10.13485/j.cnki.11-2089.2014.0187.
[2] 张鑫龙, 陈秀万, 李飞, 等. 高分辨率遥感影像的深度学习变化检测方法[J]. 测绘学报, 2017, 46(8):999-1008. DOI:10.11947/j.AGCS.2017.20170036. ZHANG Xinlong, CHEN Xiuwan, LI Fei, et al. Change detection method for high resolution remote sensing images using deep learning[J]. Acta Geodaetica et Cartographica Sinica, 2017, 46(8):999-1008. DOI:10.11947/j.AGCS.2017.20170036.
[3] LI Yansheng, TAO Chao, TAN Yihua, et al. Unsupervised multilayer feature learning for satellite image scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(2):157-161.
[4] 许夙晖, 慕晓冬, 赵鹏, 等. 利用多尺度特征与深度网络对遥感影像进行场景分类[J]. 测绘学报, 2016, 45(7):834-840. DOI:10.11947/j.AGCS.2016.20150623. XU Suhui, MU Xiaodong, ZHAO Peng, et al. Scene classification of remote sensing image based on multi-scale feature and deep neural network[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(7):834-840. DOI:10.11947/j.AGCS.2016.20150623.
[5] 郑卓, 方芳, 刘袁缘, 等. 高分辨率遥感影像场景的多尺度神经网络分类法[J]. 测绘学报, 2018, 47(5):620-630. DOI:10.11947/j.AGCS.2018.20170191. ZHENG Zhuo, FANG Fang, LIU Yuanyuan, et al. Joint multi-scale convolution neural network for scene classification of high resolution remote sensing imagery[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(5):620-630. DOI:10.11947/j.AGCS.2018.20170191.
[6] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Large-scale remote sensing image retrieval by deep hashing neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2):950-965.
[7] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(11):6521-6536.
[8] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 146:182-196.
[9] DAI Yuchao, ZHANG Jing, HE Mingyi, et al. Salient object detection from multi-spectral remote sensing images with deep residual network[J]. Journal of Geodesy and Geoinformation Science, 2019, 2(2):101-110.
[10] LI Yansheng, CHEN Wei, ZHANG Yongjun, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning[J]. Remote Sensing of Environment, 2020, 250:112045.
[11] 何小飞, 邹峥嵘, 陶超, 等. 联合显著性和多层卷积神经网络的高分影像场景分类[J]. 测绘学报, 2016, 45(9):1073-1080. DOI:10.11947/j.AGCS.2016.20150612. HE Xiaofei, ZOU Zhengrong, TAO Chao, et al. Combined saliency with multi-convolutional neural network for high resolution remote sensing scene classification[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(9):1073-1080. DOI:10.11947/j.AGCS.2016.20150612.
[12] ZHANG Fan, DU Bo, ZHANG Liangpei. Scene classification via a gradient boosting random convolutional network framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(3):1793-1802.
[13] LI Yansheng, ZHANG Yongjun, ZHU Zhihui. Error-tolerant deep learning for remote sensing image scene classification[J]. IEEE Transactions on Cybernetics, 2020. DOI:10.1109/TCYB.2020.2989241.
[14] LAROCHELLE H, ERHAN D, BENGIO Y. Zero-data learning of new tasks[C]//Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Chicago, IL:AAAI, 2008:3.
[15] PALATUCCI M, POMERLEAU D, HINTON G, et al. Zero-shot learning with semantic output codes[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada:NIPS, 2009:1410-1418.
[16] BIEDERMAN I. Recognition-by-components:a theory of human image understanding[J]. Psychological Review, 1987, 94(2):115-147.
[17] KODIROV E, XIANG Tao, GONG Shaogang. Semantic autoencoder for zero-shot learning[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI:IEEE, 2017:4447-4456.
[18] LI Yanan, WANG Donghui, HU Huanhang, et al. Zero-shot recognition using dual visual-semantic mapping paths[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI:IEEE, 2017:5207-5215.
[19] XIAN Yongqin, LAMPERT C H, SCHIELE B, et al. Zero-shot learning:a comprehensive evaluation of the good, the bad and the ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9):2251-2265.
[20] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[R]. Pasadena:California Institute of Technology, 2011.
[21] LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL:IEEE, 2009:951-958.
[22] MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NE:NIPS, 2013:3111-3119.
[23] PENNINGTON J, SOCHER R, MANNING C. Glove:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:EMNLP, 2014:1532-1543.
[24] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Quebec, Canada:NIPS, 2014:2672-2680.
[25] XIAN Yongqin, LORENZ T, SCHIELE B, et al. Feature generating networks for zero-shot learning[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:5542-5551.
[26] ELHOSEINY M, ELFEKI M. Creativity inspired zero-shot learning[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul:IEEE, 2019:5783-5792.
[27] SUMBUL G, CINBIS R G, AKSOY S. Fine-grained object recognition and zero-shot learning in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2):770-779.
[28] SONG Qian, XU Feng. Zero-shot learning of SAR target feature space with deep generative neural networks[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(12):2245-2249.
[29] GUI Rong, XU Xin, WANG Lei, et al. A generalized zero-shot learning framework for PolSAR land cover classification[J]. Remote Sensing, 2018, 10(8):1307.
[30] QUAN Jicheng, WU Chen, WANG Hongwei, et al. Structural alignment based zero-shot classification for remote sensing scenes[C]//Proceedings of 2018 IEEE International Conference on Electronics and Communication Engineering. Xi'an, China:IEEE, 2018:17-21.
[31] LI Aoxue, LU Zhiwu, WANG Liwei, et al. Zero-shot scene classification for high spatial resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7):4157-4167.
[32] 吴晨, 王宏伟, 袁昱纬, 等. 基于图像特征融合的遥感场景零样本分类算法[J]. 光学学报, 2019, 39(6):61-68. WU Chen, WANG Hongwei, YUAN Yuwei, et al. Image feature fusion based remote sensing scene zero-shot classification algorithm[J]. Acta Optica Sinica, 2019, 39(6):61-68.
[33] 吴晨, 袁昱纬, 王宏伟, 等. 基于词向量融合的遥感场景零样本分类算法[J]. 计算机科学, 2019, 46(12):286-291. WU Chen, YUAN Yuwei, WANG Hongwei, et al. Word vectors fusion based remote sensing scenes zero-shot classification algorithm[J]. Computer Science, 2019, 46(12):286-291.
[34] BARTELS R H, STEWART G W. Solution of the matrix equation AX+ XB=C[F4] [J]. Communications of the ACM, 1972, 15(9):820-826.
[35] XIA Guisong S, HU Jingwen, HU Fan, et al. AID:a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7):3965-3981.
[36] CHENG Gong, HAN Junwei, LU Xiaoqiang. Remote sensing image scene classification:benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10):1865-1883.
[37] YANG Yi, NEWSAM S. Bag-of-visual-words and spatial extensions for land-use classification[C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, CA:GIS, 2010:270-279.
[38] ZHOU Weixun, NEWSAM S, LI Congmin, et al. Pattern net:a benchmark dataset for performance evaluation of remote sensing image retrieval[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145:197-209.
[39] LI Haifeng, DOU Xin, TAO Chao, et al. RSI-CB:a large-scale remote sensing image classification benchmark using crowdsourced data[J]. Sensors, 2020, 20(6):1594.
[40] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV:IEEE, 2016:770-778.
[41] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5:135-146.
[42] HOERL A E, KENNARD R W. Ridge regression:biased estimation for nonorthogonal problems[J]. Technometrics, 1970, 12(1):55-67.
[43] TAO S Y, YEH Y R, WANG Y C F. Semantics-preserving locality embedding for zero-shot learning[C]//Proceedings of British Machine Vision Conference. London, UK:BMVC, 2017:2017.