The scene information existing in high resolution remote sensing images is important for image interpretation and understanding of the real world. Traditional scene classification methods often use middle and low-level artificial features, but high resolution images have rich information and complex scene configuration, which need high-level feature to express. A joint saliency and multi-convolutional neural network method is proposed in this paper. Firstly, we obtain meaningful patches that include dominant image information by saliency sampling. Secondly, these patches will be set as a sample input to the convolutional neural network for training, obtain feature expression on different levels. Finally, we embed the multi-layer features into the support vector machine (SVM) for image classification. Experiments using two high resolution image scene data show that saliency sampling can effectively get the main target, weaken the impact of other unrelated targets, and reduce data redundancy; convolutional neural network can automatically learn the high-level feature, compared to existing methods, the proposed method can effectively improve the classification accuracy.
HE Xiaofei
,
ZOU Zhengrong
,
TAO Chao
,
ZHANG Jiaxing
. Combined Saliency with Multi-Convolutional Neural Network for High Resolution Remote Sensing Scene Classification[J]. Acta Geodaetica et Cartographica Sinica, 2016
, 45(9)
: 1073
-1080
.
DOI: 10.11947/j.AGCS.2016.20150612
[1] CHERIYADAT A M. Unsupervised Feature Learning for Aerial Scene Classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(1): 439-451.
[2] VAILAYA A, FIGUEIREDO M A T, JAIN A K, et al. Image Classification for Content-based Indexing[J]. IEEE Transactions on Image Processing, 2001, 10(1): 117-130.
[3] SERRANO N, SAVAKIS A E, LUO Jiebo. Improved Scene Classification Using Efficient Low-level Features and Semantic Cues[J]. Pattern Recognition, 2004, 37(9): 1773-1784.
[4] OLIVA A, TORRALBA A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope[J]. International Journal of Computer Vision, 2001, 42(3): 145-175.
[5] SIVIC J, ZISSERMAN A. Video Google: A Text Retrieval Approach to Object Matching in Videos[C]//Proceedings of the Ninth IEEE International Conference on Computer Vision. Nice, France: IEEE, 2003, 2: 1470-1477.
[6] LAZEBNIK S, SCHMID C, PONCE J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY: IEEE, 2006: 2169-2178.
[7] YANG Yi, NEWSAM S. Spatial Pyramid Co-occurrence for Image Classification[C]//Proceedings of IEEE International Conference on Computer Vision. Barcelona: IEEE, 2011: 1465-1472.
[8] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[9] LIENOU M, MAITRE H, DATCU M. Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation[J]. IEEE Geoscience and Remote Sensing Letters, 2010, 7(1): 28-32.
[10] VĂDUVA C, GAVĂT I, DATCU M. Latent Dirichlet Allocation for Spatial Analysis of Satellite Images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(5): 2770-2786.
[11] BOSCH A, ZISSERMAN A, MUÑOZ X. Scene Classification Using a Hybrid Generative/Discriminative Approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(4): 712-727.
[12] ZHANG Fan, DU Bo, ZHANG Lianpei. Saliency-guided Unsupervised Feature Learning for Scene Classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(4): 2175-2184.
[13] 赵爽. 基于卷积神经网络的遥感图像分类方法研究[D]. 北京: 中国地质大学(北京), 2015. ZHAO Shuang. Remote Sensing Image Classification Method Based on Convolutional Neural Networks[D]. Beijing: China University of Geosciences (Beijing), 2015.
[14] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[15] 解文杰. 基于中层语义表示的图像场景分类研究[D]. 北京: 北京交通大学, 2011. XIE Wenjie. Research on Middle Semantic Representation Based Image Scene Classification[D]. Beijing: Beijing Jiaotong University, 2011.
[16] CHENG Dongyang, SUN Tanfeng, JIANG Xinghao, et al. Unsupervised Feature Learning Using Markov Deep Belief Network[C]//Proceedings of the 2013 20th IEEE International Conference on Image Processing. Melbourne, VIC: IEEE, 2013: 260-264.
[17] 温奇, 李苓苓, 刘庆杰, 等. 基于视觉显著性和图分割的高分辨率遥感影像中人工目标区域提取[J]. 测绘学报, 2013, 42(6): 831-837. WEN Qi, LI Lingling, LIU Qingjie, et al. A Man-made Object Area Extraction Method Based on Visual Saliency Detection and Graph-cut Segmentation for High Resolution Remote Sensing Imagery[J]. Acta Geodaetica et Cartographica Sinica, 2013, 42(6): 831-837.
[18] MARGOLIN R, TAL A, ZELNIK-MANOR L. What Makesa Patch Distinct?[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR: IEEE, 2013: 1139-1146.
[19] BENGIO Y, COURVILLE A, VINCENT P. Representation Learning: A Review and New Perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828.
[20] 余凯, 贾磊, 陈雨强, 等. 深度学习的昨天、今天和明天[J]. 计算机研究与发展, 2013, 50(9): 1799-1804. YU Kai, JIA Lei, CHEN Yuqiang, et al. Deep Learning: Yesterday, Today, and Tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804.
[21] MAAS A L, HANNUN A Y, NG A Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models[C]//Proceedings of ICML Workshop on Deep Learning for Audio, Speech, and Language. [S.l.]: ICML, 2013: 1.
[22] SCHMIDT M, VAN DEN BERG E, FRIEDLANDER M, et al. Optimizing Costly Functions with Simple Constraints: A Limited-memory Projected Quasi-Newton Algorithm[C]//Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. Florida: ACM, 2009: 456-463.
[23] YANG Y, NEWSAM S. Bag-of-visual-words and Spatial Extensions for Land-use Classification[C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose: ACM, 2010: 270-279.