Zero-shot remote sensing image scene classification based on robust cross-domain mapping and gradual refinement of semantic space

doi:10.11947/j.AGCS.2020.20200139

Abstract

Abstract: Zero-shot classification technology aims to acquire the ability to identify categories that do not appear in the training stage (unseen classes) by learning some categories of the data set (seen classes), which has important practical significance in the era of remote sensing big data. Until now, the zero-shot classification methods in remote sensing field pay little attention to the semantic space optimization after mapping, which results in poor classification performance. Based on this consideration, this paper proposed a zero shot remote sensing image scene classification method based on cross-domain mapping with auto-encoder and collaborative representation learning. In the supervised learning module, based on the class semantic vector of seen class and the scene image sample, the depth feature extractor learning and robust mapping from visual space to semantic space are realized. In the unsupervised learning stage, based on the class semantic vectors of all classes and the unseen remote sensing image samples, collaborative representation learning and k-nearest neighbor algorithm are used to modify the semantic vectors of unseen classes, so as to alleviate the problem of the shift of seen class semantic space and unseen class semantic space one after another and unseen after self coding cross domain mapping model mapping the shift of class semantic space and unseen class semantic space after collaborative representation. In the testing phase, based on the depth feature extractor, self coding cross domain mapping model and modified unseen class semantic vector, the classification of unseen class remote sensing image scene can be realized. We integrate a number of open remote sensing image scene data sets and build a new remote sensing image scene data set, experiments were conducted using this dataset The experimental results show that the algorithm proposed in this paper were significantly better than the existing zero shot classification method in the case of a variety of seen and unseen classes.

Key words: zero-shot learning, remote sensing image scene classification, cross-domain mapping with auto-encoder, collaborative representation learning, natural language processing

CLC Number:

P237

LI Yansheng, KONG Deyu, ZHANG Yongjun, JI Zheng, XIAO Rui. Zero-shot remote sensing image scene classification based on robust cross-domain mapping and gradual refinement of semantic space[J]. Acta Geodaetica et Cartographica Sinica, 2020, 49(12): 1564-1574.

References

[1] 李德仁, 张良培, 夏桂松. 遥感大数据自动分析与数据挖掘[J]. 测绘学报, 2014, 43(12):1211-1216. DOI:10.13485/j.cnki.11-2089.2014.0187. LI Deren, ZHANG Liangpei, XIA Guisong. Automatic analysis and mining of remote sensing big data[J]. Acta Geodaetica et Cartographica Sinica, 2014, 43(12):1211-1216. DOI:10.13485/j.cnki.11-2089.2014.0187.
[2] 张鑫龙, 陈秀万, 李飞, 等. 高分辨率遥感影像的深度学习变化检测方法[J]. 测绘学报, 2017, 46(8):999-1008. DOI:10.11947/j.AGCS.2017.20170036. ZHANG Xinlong, CHEN Xiuwan, LI Fei, et al. Change detection method for high resolution remote sensing images using deep learning[J]. Acta Geodaetica et Cartographica Sinica, 2017, 46(8):999-1008. DOI:10.11947/j.AGCS.2017.20170036.
[3] LI Yansheng, TAO Chao, TAN Yihua, et al. Unsupervised multilayer feature learning for satellite image scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(2):157-161.
[4] 许夙晖, 慕晓冬, 赵鹏, 等. 利用多尺度特征与深度网络对遥感影像进行场景分类[J]. 测绘学报, 2016, 45(7):834-840. DOI:10.11947/j.AGCS.2016.20150623. XU Suhui, MU Xiaodong, ZHAO Peng, et al. Scene classification of remote sensing image based on multi-scale feature and deep neural network[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(7):834-840. DOI:10.11947/j.AGCS.2016.20150623.
[5] 郑卓, 方芳, 刘袁缘, 等. 高分辨率遥感影像场景的多尺度神经网络分类法[J]. 测绘学报, 2018, 47(5):620-630. DOI:10.11947/j.AGCS.2018.20170191. ZHENG Zhuo, FANG Fang, LIU Yuanyuan, et al. Joint multi-scale convolution neural network for scene classification of high resolution remote sensing imagery[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(5):620-630. DOI:10.11947/j.AGCS.2018.20170191.
[6] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Large-scale remote sensing image retrieval by deep hashing neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2):950-965.
[7] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(11):6521-6536.
[8] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 146:182-196.
[9] DAI Yuchao, ZHANG Jing, HE Mingyi, et al. Salient object detection from multi-spectral remote sensing images with deep residual network[J]. Journal of Geodesy and Geoinformation Science, 2019, 2(2):101-110.
[10] LI Yansheng, CHEN Wei, ZHANG Yongjun, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning[J]. Remote Sensing of Environment, 2020, 250:112045.
[11] 何小飞, 邹峥嵘, 陶超, 等. 联合显著性和多层卷积神经网络的高分影像场景分类[J]. 测绘学报, 2016, 45(9):1073-1080. DOI:10.11947/j.AGCS.2016.20150612. HE Xiaofei, ZOU Zhengrong, TAO Chao, et al. Combined saliency with multi-convolutional neural network for high resolution remote sensing scene classification[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(9):1073-1080. DOI:10.11947/j.AGCS.2016.20150612.
[12] ZHANG Fan, DU Bo, ZHANG Liangpei. Scene classification via a gradient boosting random convolutional network framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(3):1793-1802.
[13] LI Yansheng, ZHANG Yongjun, ZHU Zhihui. Error-tolerant deep learning for remote sensing image scene classification[J]. IEEE Transactions on Cybernetics, 2020. DOI:10.1109/TCYB.2020.2989241.
[14] LAROCHELLE H, ERHAN D, BENGIO Y. Zero-data learning of new tasks[C]//Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Chicago, IL:AAAI, 2008:3.
[15] PALATUCCI M, POMERLEAU D, HINTON G, et al. Zero-shot learning with semantic output codes[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada:NIPS, 2009:1410-1418.
[16] BIEDERMAN I. Recognition-by-components:a theory of human image understanding[J]. Psychological Review, 1987, 94(2):115-147.
[17] KODIROV E, XIANG Tao, GONG Shaogang. Semantic autoencoder for zero-shot learning[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI:IEEE, 2017:4447-4456.
[18] LI Yanan, WANG Donghui, HU Huanhang, et al. Zero-shot recognition using dual visual-semantic mapping paths[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI:IEEE, 2017:5207-5215.
[19] XIAN Yongqin, LAMPERT C H, SCHIELE B, et al. Zero-shot learning:a comprehensive evaluation of the good, the bad and the ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9):2251-2265.
[20] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[R]. Pasadena:California Institute of Technology, 2011.
[21] LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL:IEEE, 2009:951-958.
[22] MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NE:NIPS, 2013:3111-3119.
[23] PENNINGTON J, SOCHER R, MANNING C. Glove:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:EMNLP, 2014:1532-1543.
[24] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Quebec, Canada:NIPS, 2014:2672-2680.
[25] XIAN Yongqin, LORENZ T, SCHIELE B, et al. Feature generating networks for zero-shot learning[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT:IEEE, 2018:5542-5551.
[26] ELHOSEINY M, ELFEKI M. Creativity inspired zero-shot learning[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul:IEEE, 2019:5783-5792.
[27] SUMBUL G, CINBIS R G, AKSOY S. Fine-grained object recognition and zero-shot learning in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2):770-779.
[28] SONG Qian, XU Feng. Zero-shot learning of SAR target feature space with deep generative neural networks[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(12):2245-2249.
[29] GUI Rong, XU Xin, WANG Lei, et al. A generalized zero-shot learning framework for PolSAR land cover classification[J]. Remote Sensing, 2018, 10(8):1307.
[30] QUAN Jicheng, WU Chen, WANG Hongwei, et al. Structural alignment based zero-shot classification for remote sensing scenes[C]//Proceedings of 2018 IEEE International Conference on Electronics and Communication Engineering. Xi'an, China:IEEE, 2018:17-21.
[31] LI Aoxue, LU Zhiwu, WANG Liwei, et al. Zero-shot scene classification for high spatial resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7):4157-4167.
[32] 吴晨, 王宏伟, 袁昱纬, 等. 基于图像特征融合的遥感场景零样本分类算法[J]. 光学学报, 2019, 39(6):61-68. WU Chen, WANG Hongwei, YUAN Yuwei, et al. Image feature fusion based remote sensing scene zero-shot classification algorithm[J]. Acta Optica Sinica, 2019, 39(6):61-68.
[33] 吴晨, 袁昱纬, 王宏伟, 等. 基于词向量融合的遥感场景零样本分类算法[J]. 计算机科学, 2019, 46(12):286-291. WU Chen, YUAN Yuwei, WANG Hongwei, et al. Word vectors fusion based remote sensing scenes zero-shot classification algorithm[J]. Computer Science, 2019, 46(12):286-291.
[34] BARTELS R H, STEWART G W. Solution of the matrix equation AX+ XB=C[F4] [J]. Communications of the ACM, 1972, 15(9):820-826.
[35] XIA Guisong S, HU Jingwen, HU Fan, et al. AID:a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7):3965-3981.
[36] CHENG Gong, HAN Junwei, LU Xiaoqiang. Remote sensing image scene classification:benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10):1865-1883.
[37] YANG Yi, NEWSAM S. Bag-of-visual-words and spatial extensions for land-use classification[C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, CA:GIS, 2010:270-279.
[38] ZHOU Weixun, NEWSAM S, LI Congmin, et al. Pattern net:a benchmark dataset for performance evaluation of remote sensing image retrieval[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145:197-209.
[39] LI Haifeng, DOU Xin, TAO Chao, et al. RSI-CB:a large-scale remote sensing image classification benchmark using crowdsourced data[J]. Sensors, 2020, 20(6):1594.
[40] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV:IEEE, 2016:770-778.
[41] BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5:135-146.
[42] HOERL A E, KENNARD R W. Ridge regression:biased estimation for nonorthogonal problems[J]. Technometrics, 1970, 12(1):55-67.
[43] TAO S Y, YEH Y R, WANG Y C F. Semantics-preserving locality embedding for zero-shot learning[C]//Proceedings of British Machine Vision Conference. London, UK:BMVC, 2017:2017.