地图学与地理信息

M-Quadtree索引:一种基于改进四叉树编码方法的云存储环境下空间索引方法

  • 付仲良 ,
  • 胡玉龙 ,
  • 翁宝凤 ,
  • 彭瑞
展开
  • 1. 武汉大学遥感信息工程学院, 湖北 武汉 430079;
    2. 浙江省地理信息中心, 浙江 杭州 310012
付仲良(1965-),男,博士,教授,博士生导师,主要研究方向为GIS、矢量数据匹配。E-mail:fuzhl@263.net

收稿日期: 2015-07-21

  修回日期: 2016-09-10

  网络出版日期: 2016-12-03

M-Quadtree Index: A Spatial Index Method for Cloud Storage Environment Based on Modified Quadtree Coding Approach

  • FU Zhongliang ,
  • HU Yulong ,
  • WENG Baofeng ,
  • PENG Rui
Expand
  • 1. School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China;
    2. Geomatics Center of Zhejiang, Hangzhou 310012, China

Received date: 2015-07-21

  Revised date: 2016-09-10

  Online published: 2016-12-03

摘要

为了解决基于“键-值”模型的云存储环境仅支持简单的关键字查询,不支持多维空间查询的问题,提出了一种新的分布式空间索引方法——M-Quadtree索引。在索引构建过程中,设计了一种基于改进四叉树的空间数据划分方法,该方法规定了叶节点区域的最小数据量,通过四叉树叶节点的再合并,解决了划分后各子区域间存储量不平衡的问题,并且满足了MapReduce并行化要求。给出了MapReduce框架下M-Quadtree索引的快速构建、查询与更新算法,并在搭建的Hadoop平台进行了关键参数对索引效率的影响以及不同规模数据下索引的创建、查询和更新试验。与现有分布式空间索引的对比试验及分析结果表明,M-Quadtree索引在数据存储量负载均衡、算法并行化和空间查询效率等方面表现得更好。

本文引用格式

付仲良 , 胡玉龙 , 翁宝凤 , 彭瑞 . M-Quadtree索引:一种基于改进四叉树编码方法的云存储环境下空间索引方法[J]. 测绘学报, 2016 , 45(11) : 1342 -1351 . DOI: 10.11947/j.AGCS.2016.20150408

Abstract

Currently, the cloud storage platform based on key-value model can only support simple keyword queries but cannot support multidimensional spatial queries. To solve the problem, this paper puts forward a new method of distributed spatial index-M-Quadtree index. In the process of index building, a space partitioning method based on improved quadtree was proposed. This partitioning method specifies the minimum amount of data in the leaf area. By recombining the quad leaves, it solves the problem of storage imbalance among sub regions, and meets the parallel requirements of the MapReduce. This paper describes some algorithms about M-Quadtree index building,querying and updating under the MapReduce framework. In the experiments, we implement the M-Quadtree index on Hadoop platform to test the effect of key parameter on the efficiency of index, and also test the efficiency of index building, querying and updating under different scale of data. Comparing with existing distributed spatial index, experiments show that the M-Quadtree index performs better on data load balancing, algorithm parallelism and the efficiency of spatial querying.

参考文献

[1] YANG Chaowei, GOODCHILD M, HUANG Qunying, et al. Spatial Cloud Computing: How Can the Geospatial Sciences Use and Help Shape Cloud Computing?[J]. International Journal of Digital Earth, 2011, 4(4): 305-329.
[2] 李德仁. 展望大数据时代的地球空间信息学[J]. 测绘学报, 2016, 45(4): 379-384. DOI: 10.11947/j.AGCS.2016.20160057. LI Deren. Towards Geo-Spatial Information Science in Big Data Era[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(4): 379-384. DOI: 10.11947/j.AGCS.2016.20160057.
[3] VAQUERO L M, RODERO-MERINO L, CACERES J, et al. A Break in the Clouds: Towards a Cloud Definition[J]. ACM SIGCOMM Computer Communication Review, 2009, 39(1): 50-55.
[4] 李德仁, 张良培, 夏桂松. 遥感大数据自动分析与数据挖掘[J]. 测绘学报, 2014, 43(12): 1211-1216. DOI: 10.13485/j.cnki.11-2089.2014.0187. LI Deren, ZHANG Liangpei, XIA Guisong. Automatic Analysis and Mining of Remote Sensing Big Data[J]. Acta Geodaetica et Cartographica Sinica, 2014, 43(12): 1211-1216. DOI: 10.13485/j.cnki.11-2089.2014.0187.
[5] CARY A, YESHA Y, ADJOUADI M, et al. Leveraging Cloud Computing in Geodatabase Management[C]//Proceedings of the 2010 IEEE International Conference on Granular Computing. Washington, DC: IEEE Computer Society, 2010: 73-78.
[6] 郭仁忠, 刘江涛, 彭子凤, 等. 开放式空间基础信息平台的发展特征与技术内涵[J]. 测绘学报, 2012, 41(3): 323-326. GUO Renzhong, LIU Jiangtao, PENG Zifeng, et al. Technologies Connotation and Developing Characteristics of Open Geospatial Information Platform[J]. Acta Geodaetica et Cartographica Sinica, 2012, 41(3): 323-326.
[7] ZENG Wenying, ZHAO Yuelong, OU Kairi, et al. Research on Cloud Storage Architecture and Key Technologies[C]//Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human. New York: ACM, 2009: 1044-1048.
[8] MOUZA C D, LITWIN W, RIGAUX P. Large-Scale Indexing of Spatial Data in Distributed Repositories: the SD-Rtree[J]. The VLDB Journal, 2009, 18(4): 933-958.
[9] MONDAL A, YI Lifu, KITSUREGAWA M. P2PR-Tree: an R-Tree-Based Spatial Index for Peer-to-Peer Environments[M]//LINDNER W, MESITI M, TüRKER C, et al. Current Trends in Database Technology-EDBT 2004 Workshops. Berlin Heidelberg: Springer, 2004: 516-525.
[10] WEI Lingyin, HSU Y T, PENG W C, et al. Indexing Spatial Data in Cloud Data Managements[J]. Pervasive and Mobile Computing, 2014, 15: 48-61.
[11] FENG Jun, TANG Zhixian, WEI Mi'an, et al. HQ-Tree: a Distributed Spatial Index Based on Hadoop[J]. China Communications, 2014, 11(7): 128-141.
[12] TANIN E, HARWOOD A, SAMET H. Using a Distributed Quadtree Index in Peer-To-Peer Networks[J]. The VLDB Journal, 2007, 16(2): 165-178.
[13] NISHIMURA S, DAS S, AGRAWAL D, et al. MD-HBase: Design and Implementation of an Elastic Data Infrastructure for Cloud-Scale Location Services[J]. Distributed and Parallel Databases, 2013, 31(2): 289-319.
[14] LAWDER J K, KING P J H. Querying Multi-Dimensional Data Indexed Using the Hilbert Space-Filling Curve[J]. ACM SIGMOD Record, 2001, 30(1): 19-24.
[15] CARY A, SUN Zhengguo, HRISTIDIS V, et al. Experiences on Processing Spatial Data with MapReduce[C]//Proceedings of the 21st International Conference on Scientific and Statistical Database Management. Berlin, Heidelberg: Springer-Verlag, 2009.
[16] CARY A, YESHA Y, ADJOUADI M, et al. Leveraging Cloud Computing in Geodatabase Management[C]//Proceedings of the 2010 IEEE International Conference on Granular Computing. Washington, DC: IEEE, 2010: 73-78.
[17] 吴明光. 一种空间分布模式驱动的空间索引[J]. 测绘学报, 2015, 44(1): 108-115. DOI: 10.11947/j.AGCS.2015.20130245. WU Mingguang. A Spatial Distribution Pattern-driven Spatial Index[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(1): 108-115. DOI: 10.11947/j.AGCS.2015.20130245.
[18] 王永杰, 孟令奎, 赵春宇. 基于Hilbert空间排列码的海量空间数据划分算法研究[J]. 武汉大学学报(信息科学版), 2007, 32(7): 650-653. WANG Yongjie, MENG Lingkui, ZHAO Chunyu. Spatial Partitioning of Massive Data Based on Hilbert Spatial Ordering Code[J]. Geomatics and Information Science of Wuhan University, 2007, 32(7): 650-653.
[19] LIU Yi, JING Ning, CHEN Luo, et al. Parallel Bulk-Loading of Spatial Data with MapReduce: an R-tree Case[J]. Wuhan University Journal of Natural Sciences, 2011, 16(6): 513-519. DOI: 10.1007/sl1859-011-0790-3.
[20] 马友忠, 孟小峰. 云数据管理索引技术研究[J]. 软件学报, 2015, 26(1): 145-166. MA Youzhong, MENG Xiaofeng. Research on Indexing for Cloud Data Management[J]. Journal of Software, 2015, 26(1): 145-166.
[21] MOON B, JAGADISH H V, FALOUTSOS C, et al. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve[J]. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(1): 124-141.
[22] ABEL D J, MARK D M. A Comparative Analysis of Some Two-Dimensional Orderings[J]. International Journal of Geographical Information Systems, 1990, 4(1): 21-31.
[23] SHVACHKO K, KUANG H, RADIA S, et al. The Hadoop Distributed File System[C]//Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies. Washington, DC: IEEE Computer Society, 2010: 1-10.
文章导航

/