Towards a Scale-driven Theory for Spatial Clustering

  • LI Zhilin ,
  • LIU Qiliang ,
  • TANG Jianbo
Expand
  • 1. Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China;
    2. Department of Geo-Informatics, Central South University, Changsha 410083, China;
    3. State-Provincial Joint Engineering Laboratory of Spatial Information Technology for High-speed Railway Safety, Southwest Jiaotong University, Chengdu 611756, China

Received date: 2017-05-26

  Revised date: 2017-09-04

  Online published: 2017-10-26

Supported by

The National Natural Science Foundation of China (Nos. 41601410;41471383);The Natural Science Foundation of Hunan Province (No. 2017JJ3379)

Abstract

Spatial clustering plays a key role in exploratory geographical data analysis. It is important for investigating the distribution of geographical phenomena. Spatial clustering sometimes also serves as an important pre-processing for other geographical data analysis techniques. Although lots of attentions have been paid to spatial clustering, two serious obstacles remain to be tackled:①clusters will always be discovered in any geographical dataset by spatial clustering algorithms, even if the input dataset is a random dataset; ②users feel difficult to interpret the various clustering results obtained by using different parameters. It is hypothesized that scale is not handled well in clustering process. As a result, a scale-driven theory for spatial clustering is introduced in this study, based on the human recognition theory and the natural principle of multi-scale representation. Scale is modeled as parameter of a clustering model, and the scale dependency in spatial clustering is handled by constructing a hypothesis testing, and multi-scale significant clusters can be easily discovered by controlling the scale parameters in an objective manner.

Cite this article

LI Zhilin , LIU Qiliang , TANG Jianbo . Towards a Scale-driven Theory for Spatial Clustering[J]. Acta Geodaetica et Cartographica Sinica, 2017 , 46(10) : 1534 -1548 . DOI: 10.11947/j.AGCS.2017.20170275

References

[1] GETIS A. Spatial Autocorrelation[M]//FISCHER M M, GETIS A. Handbook of Applied Spatial Analysis:Software Tools, Methods and Applications. Berlin:Springer-Verlag, 2010.
[2] 王劲峰, 葛咏, 李连发, 等. 地理学时空数据分析方法[J]. 地理学报, 2014, 69(9):1326-1345. WANG Jinfeng, GE Yong, LI Lianfa, et al. Spatiotemporal Data Analysis in Geography[J]. Acta Geographica Sinica, 2014, 69(9):1326-1345.
[3] 李德仁, 王树良, 李德毅. 空间数据挖掘理论与应用[M]. 2版. 北京:科学出版社, 2013. LI Deren, WANG Shuliang, LI Deyi. Spatial Data Mining Theories and Applications[M]. 2nd ed. Beijing:Science Press, 2013.
[4] SHEKHAR S, JIANG Zhe, ALI R, et al. Spatio-temporal Data Mining:A Computational Perspective[J]. ISPRS International Journal of Geo-Information, 2015, 4(4):2306-2338.
[5] KULLDORFF M, NAGARWALLA N. Spatial Disease Clusters:Detection and Inference[J]. Statistics in Medicine, 1995, 14(8):799-810.
[6] WANG Xin, ROSTOKER C, HAMILTON H J. A Density-based Spatial Clustering for Physical Constraints[J]. Journal of Intelligent Information Systems, 2012, 38(1):269-297.
[7] FOVELL R G, FOVELL M Y C. Climate Zones of the Conterminous United States Defined Using Cluster Analysis[J]. Journal of Climate, 1993, 6(11):2103-2135.
[8] ZALIAPIN I, BEN-ZION Y. Earthquake Clusters in Southern California I:Identification and Stability[J]. Journal of Geophysical Research:Solid Earth, 2013, 118(6):2847-2864.
[9] KUPFER J A, GAO Peng, GUO Diansheng. Regionalization of Forest Pattern Metrics for the Continental United States Using Contiguity Constrained Clustering and Partitioning[J]. Ecological Informatics, 2012(9):11-18.
[10] DRǍGUŢ L, TIEDE D, LEVICK S R. ESP:A Tool to Estimate Scale Parameter for Multiresolution Image Segmentation of Remotely Sensed Data[J]. International Journal of Geographical Information Science, 2010, 24(6):859-871.
[11] WANG Fahui, GUO Diansheng, MCLAFFERTY S. Constructing Geographic Areas for Cancer Data Analysis:A Case Study on Late-stage Breast Cancer Risk in Illinois[J]. Applied Geography, 2012, 35(1-2):1-11.
[12] DENG Min, TANG Jianbo, LIU Qiliang, et al. Recognizing Building Groups for Generalization:A Comparative Study[J]. Cartography and Geographic Information Science, 2017:1-18. DOI:10.1080/15230406.2017.1302821.
[13] GUO Diansheng, GAHEGAN M, MACEACHREN A M, et al. Multivariate Analysis and Geovisualization with an Integrated Geographic Knowledge Discovery Approach[J]. Cartography and Geographic Information Science, 2005, 32(2):113-132.
[14] DENG Min, FAN Zide, LIU Qiliang, et al. A Hybrid Method for Interpolating Missing Data in Heterogeneous Spatio-Temporal Datasets[J]. ISPRS International Journal of Geo-Information, 2016, 5(2):13. DOI:10.3390/ijgi5020013.
[15] 李德毅. 大数据认知——"2015大数据价值实现之路高峰论坛"主题报告[J]. 重庆理工大学学报(自然科学), 2015, 29(9):1-6. LI Deyi. Big Data Cognition:Keynote Lecture of "2015 Forum of Big Data Value Realization Road"[J]. Journal of Chongqing University of Technology (Natural Science), 2015, 29(9):1-6.
[16] TRYON R C. Cluster Analysis:Correlation Profile and Orthometric (Factor) Analysis for the Isolation of Unities in Mind and Personality[M]. Ann Arbor:Edwards Brothers, Inc., Lithoprinters and Publishers, 1939.
[17] WU Xindong, KUMAR V, QUINLAN J R, et al. Top 10 Algorithms in Data Mining[J]. Knowledge and Information Systems, 2008, 14(1):1-37.
[18] XU R, WUNSCH Ⅱ D C. Clustering[M]. New Jersey:John Wiley & Sons, 2009.
[19] OPENSHAW S. A Geographical Solution to Scale and Aggregation Problems in Region-Building, Partitioning and Spatial Modelling[J]. Transactions of the Institute of British Geographers, 1977, 2(4):459-472.
[20] OPENSHAW S, CHARLTON M, WYMER C, et al. A Mark 1 Geographical Analysis Machine for the Automated Analysis of Point Data Sets[J]. International Journal of Geographical Information Systems, 1987, 1(4):335-358.
[21] 邓敏, 刘启亮, 李光强, 等. 空间聚类分析及应用[M]. 北京:科学出版社, 2011. DENG Min, LIU Qiliang, LI Guangqiang, et al. Spatial Clustering and Its Applications[M]. Beijing:Science Press, 2011.
[22] [JP+7] ESTIVILL-CASTRO V. Why So Many Clustering Algorithms:A Position Paper[J]. ACM SIGKDD Explorations Newsletter, 2002, 4(1):65-75.
[23] JAIN A K. Data Clustering:50 Years beyond k-means[J]. Pattern Recognition Letters, 2010, 31(8):651-666.
[24] TAN Pangning, STEINBACH M, KUMAR V. Introduction to Data Mining[M]. Boston:Addison Wesley Press, 2006.
[25] AGGARWAL C C, REDDY C K. Data Clustering:Algorithms and Applications[M]. London:CRC Press, 2014.
[26] KISILEVICH S, MANSMANN F, NANNI M, et al. Spatio-temporal Clustering[M]//MAIMON O, ROKACH L. 2nd ed. Data Mining and Knowledge Discovery Handbook. Berlin:Springer, 2010:855-874.
[27] GUO D S. Exploratory Spatial Data Analysis[M]//RICHARDSON D, CASTREE N, GOODCHILD M F, et al. The International Encyclopedia of Geography. London:John Wiley & Sons, 2017:1-25.
[28] 王劲峰. 空间分析[M]. 北京:科学出版社, 2006:185-217. WANG Jinfeng. Spatial Analysis[M]. Beijing:Science Press, 2006:185-217.
[29] MACQUEEN J B. Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley:University of California Press, 1967:281-297.
[30] BALL G H, HALL D J. A Clustering Technique for Summarizing Multivariate Data[J]. Behavioral Science, 1967, 12(2):153-155.
[31] KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data:An Introduction to Cluster Analysis[M]. New York:John Wiley & Sons, 1990.
[32] BEZDEK J. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. New York:Plenum Press, 1981.
[33] NG R T, HAN Jiawei. CLARANS:A Method for Clustering Objects for Spatial Data Mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5):1003-1016.
[34] PHAM D L. Spatial Models for Fuzzy Clustering[J]. Computer Vision and Image Understanding, 2001, 84(2):285-297.
[35] IZAKIAN H, PEDRYCZ W, JAMAL I. Clustering Spatiotemporal Data:An Augmented Fuzzy C-Means[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5):855-868.
[36] OPENSHAW S, RAO L. Algorithms for Reengineering 1991 Census Geography[J]. Environment and Planning A, 1995, 27(3):425-446.
[37] DUQUE J C, ANSELIN L, REY S J. The Max-P-Regions Problem[J]. Journal of Regional Science, 2012, 52(3):397-419.
[38] KOHONEN T. Self-organizing Maps[M]. Berlin:Springer Press, 1995.
[39] 程博艳, 刘强, 李小文. 一种建筑物群智能聚类法[J]. 测绘学报, 2013, 42(2):290-294, 303. CHENG Boyan, LIU Qiang, LI Xiaowen. Intelligent Building Grouping Using a Self-organizing Map[J]. Acta Geodaetica et Cartographica Sinica, 2013, 42(2):290-294, 303.
[40] BAÇÃO F, LOBO V, PAINHO M. The Self-organizing Map, the Geo-SOM, and Relevant Variants for Geosciences[J]. Computers & Geosciences, 2005, 31(2):155-163.
[41] HAGENAUER J, HELBICH M. Hierarchical Self-organizing Maps for Clustering Spatiotemporal Data[J]. International Journal of Geographical Information Science, 2013, 27(10):2026-2042.
[42] GUHA S, RASTOGI R, SHIM K. CURE:An Efficient Clustering Algorithm for Large Databases[C]//Proceedings of 1998 ACM-SIGMOD International Conference on Management of Data. Washington:ACM Press, 1998:73-84.
[43] 郭庆胜, 郑春燕, 胡华科. 基于邻近图的点群层次聚类方法的研究[J]. 测绘学报, 2008, 37(2):256-261. GUO Qingsheng, ZHENG Chunyan, HU Huake. Hierarchical Clustering Method of Group of Points Based on the Neighborhood Graph[J]. Acta Geodaetica et Cartographica Sinica, 2008, 37(2):256-261.
[44] ZAHN C T. Graph-theoretical Methods for Detecting and Describing Gestalt Clusters[J]. IEEE Transactions on Computer, 1971, C-20(1):68-86.
[45] 艾廷华, 郭仁忠. 基于格式塔识别原则挖掘空间分布模式[J]. 测绘学报, 2007, 36(3):302-308. AI Tinghua, GUO Renzhong. Polygon Cluster Pattern Mining Based on Gestalt Principles[J]. Acta Geodaetica et Cartographica Sinica, 2007, 36(3):302-308.
[46] KARYPIS G, HAN E H, KUMAR V. CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling[J]. IEEE Computer, 1999, 32(8):68-75.
[47] ESTIVILL-CASTRO V, LEE I. Multi-level Clustering and Its Visualization for Exploratory Spatial Analysis[J]. GeoInformatica, 2002, 6(2):123-152.
[48] DENG Min, LIU Qiliang, CHENG Tao, et al. An Adaptive Spatial Clustering Algorithm Based on Delaunay Triangulation[J]. Computers, Environment and Urban Systems, 2011, 35(4):320-332.
[49] GORDON A D. A Survey of Constrained Classification[J]. Computational Statistics & Data Analysis, 1996, 21(1):17-29.
[50] FENG C C, WANG Yichen, CHEN C Y. Combining Geo-SOM and Hierarchical Clustering to Explore Geospatial Data[J]. Transaction in GIS, 2014, 18(1):125-146.
[51] ASSUNÇÃO R M, NEVES M C, CÂMARA G, et al. Efficient Regionalization Techniques for Socio-economic Geographical Units Using Minimum Spanning Trees[J]. International Journal of Geographical Information Science, 2006, 20(7):797-811.
[52] GUO D. Regionalization with Dynamically Constrained Agglomerative Clustering and Partitioning (REDCAP)[J]. International Journal of Geographical Information Science, 2008, 22(7):801-823.
[53] KULLDORFF M,RAND K,GHERMAN G,et al. SaTScan V2.1:Software for the Spatial and Space-time Scan Statistics[R]. Bethesda, MD:National Cancer Institute, 1998.
[54] ALDSTADT J, GETIS A. Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters[J]. Geographical Analysis, 2006, 38(4):327-343.
[55] TAKAHASHI K, KULLDORFF M, TANGO T, et al. A Flexibly Shaped Space-time Scan Statistic for Disease Outbreak Detection and Monitoring[J]. International Journal of Health Geographics, 2008(7):14. DOI:10.1186/1476-072X-7-14.
[56] PEI Tao, WAN You, JIANG Yong, et al. Detecting Arbitrarily Shaped Clusters Using Ant Colony Optimization[J]. International Journal of Geographical Information Science, 2011, 25(10):1575-1595.
[57] ESTER M, KRIEGEL H, SANDER J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, OR:AAAI, 1996:45-50.
[58] ANKERST M, BREUNIG M M, KRIEGEL H P, et al. OPTICS:Ordering Points to Identify the Clustering Structure[C]//Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. Philadelphia:ACM, 1999:49-60.
[59] ERTÖZ L, STEINBACH M, KUMAR V. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data[C]//Proceedings of the 2003 SIAM International Conference on Data Mining. San Francisco:Society for Industrial and Applied Mathematics, 2003:47-58.
[60] 李光强, 邓敏, 刘启亮. 一种适应局部密度变化的空间聚类方法[J]. 测绘学报, 2009, 38(3):255-263. LI Guangqiang, DENG Min, LIU Qiliang. A Spatial Clustering Method Adaptive to Local Density Change[J]. Acta Geodaetica et Cartographica Sinica, 2009, 38(3):255-263.
[61] PEI Tao, JASRA A, HAND D J, et al. DECODE:A New Method for Discovering Clusters of Different Densities in Spatial Data[J]. Data Mining and Knowledge Discovery, 2009, 18(3):337-369.
[62] PEI Tao, ZHOU Chenghu, ZHU Axing, et al. Windowed Nearest Neighbour Method for Mining Spatio-temporal Clusters in the Presence of Noise[J]. International Journal of Geographical Information Science, 2010, 24(6):925-948.
[63] LIU Qiliang, DENG Min, BI Jiantao, et al. A Novel Method for Discovering Spatio-temporal Clusters of Different Sizes, Shapes, and Densities in the Presence of Noise[J]. International Journal of Digital Earth, 2014, 7(2):138-157.
[64] SANDER J, ESTER M, KRIEGEL H P, et al. Density-based Clustering in Spatial Databases:the Algorithm GDBSCAN and Its Applications[J]. Data Mining and Knowledge Discovery, 1998, 2(2):169-194.
[65] BIRANT D, KUT A. ST-DBSCAN:An Algorithm for Clustering Spatial-temporal Data[J]. Data & Knowledge Engineering, 2007, 60(1):208-221.
[66] LIU Qiliang, DENG Min, SHI Yan, et al. A Density-based Spatial Clustering Algorithm Considering Both Spatial Proximity and Attribute Similarity[J]. Computers & Geosciences, 2012, 46:296-309.
[67] JEUNG H, YIU M L, ZHOU Xiaofang, et al. Discovery of Convoys in Trajectory Databases[J]. Proceedings of the VLDB Endowment, 2008, 1(1):1068-1080.
[68] LI Zhenhui, DING Bolin, HAN Jiawei, et al. Swarm:Mining Relaxed Temporal Moving Object Clusters[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2):723-734.
[69] LI Yuxuan, BAILEY J, KULIK L. Efficient Mining of Platoon Patterns in Trajectory Databases[J]. Data & Knowledge Engineering, 2015, 100:167-187.
[70] 宋长青. 地理学研究范式的思考[J]. 地理科学进展, 2016, 35(1):1-3. SONG Changqing. On Paradigms of Geographical Research[J]. Progress in Geography, 2016, 35(1):1-3.
[71] ATKINSON P M, TATE N J. Spatial Scale Problems and Geostatistical Solutions:A Review[J]. The Professional Geographer, 2000, 52(4):607-623.
[72] DUNGAN J L, PERRY J N, DALE M R T, et al. A Balanced View of Scale in Spatial Statistical Analysis[J]. Ecography, 2002, 25(5):626-640.
[73] MARCEAU D J, HOWARTH P J, GRATTON D J. Remote Sensing and the Measurement of Geographical Entities in a Forested Environment:The Scale and Spatial Aggregation Problem[J]. Remote Sensing of Environment, 1994, 49(2):93-104.
[74] HAY G, MARCEAU D J, DUBé P, et al. A Multiscale Framework for Landscape Analysis:Object-specific Analysis and Upscaling[J]. Landscape Ecology, 2001, 16(6):471-490.
[75] LEUNG Y, ZHANG Jiangshe, XU Zongben. Clustering by Scale-space Filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12):1396-1410.
[76] 汪闽, 周成虎, 裴韬, 等. MSCMO:基于数学形态学算子的尺度空间聚类方法[J]. 遥感学报, 2004, 8(1):45-50. WANG Min, ZHOU Chenghu, PEI Tao, et al. MSCMO:A Scale Space Clustering Algorithm Based on Mathematical Morphology Operators[J]. Journal of Remote Sensing, 2004, 8(1):45-50.
[77] MU Lan,WANG Fahui.A Scale-space Clustering Method:Mitigating the Effect of Scale in the Analysis of Zone-based Data[J]. Annals of the Association of American Geographers, 2008, 98(1):85-101.
[78] SHEIKHOLESLAMI G,CHATTERJEE S, ZHANG Aidong. WaveCluster:A Multi-Resolution Clustering Approach for Very Large Spatial Databases[C]//Proceedings of the 24th International Conference on Very Large Data Bases. New York:Morgan Kaufmann Publishers Inc., 1998:428-439.
[79] WANG Wei, YANG Jiong, MUNTZ R R. STING:A Statistical Information Grid Approach to Spatial Data Mining[C]//Proceedings of the the 23rd International Conference on Very Large Data Bases. Athens, Greece:Morgan Kaufmann Publishers Inc., 1997:186-195.
[80] COMANICIU D, MEER P. Mean Shift:A Robust Approach Toward Feature Space Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5):603-619.
[81] GOODCHILD M F,QUATTOCHI D A. Scale, Multiscaling, Remote Sensing, and GIS[M]//QUATTROCHI D A, GOODCHILD M F. Scale in Remote Sensing and GIS. London:CRC Press, 1997:1-11.
[82] 李志林. 地理空间数据处理的尺度理论[J]. 地理信息世界, 2005, 3(2):1-5. LI Zhilin. A Theoretical Discussion on the Scale Issue in Geospatial Data Handling[J]. Geomatics World, 2005, 3(2):1-5.
[83] BAATZ M, SCHÄPE A. Multiresolution Segmentation:An Optimization Approach for High Quality Multi-scale Image Segmentation[M]//STROBL J, BLASCHKE T, GRIESEBNER G. Angewandte Geographische Informationsverarbeitung. Karlsruhe:Wichmann-Verlag, 2000:12-23.
[84] DRÃGŢ L, EISANK C, STRASSER T. Local Variance for Multi-scale Analysis in Geomorphometry[J]. Geomorphology, 2011, 130(3-4):162-172.
[85] WONG Y F. Clustering Data by Melting[J]. Neural Computation, 1993, 5(1):89-104.
[86] HUBEL D H. Eye, Brain, and Vision[M]. New York:Scientific American Library Series, 1995.
[87] DICARLO J J, COX D D. Untangling Invariant Object Recognition[J]. Trends in Cognitive Sciences, 2007, 11(8):333-341.
[88] SCHREUDER D A. Vision and Visual Perception:The Conscious Base of Seeing[M]. United States:Archway Publishing, 2014.
[89] KOENDERINK J J. The Structure of Images[J]. Biological Cybernetics, 1984, 50(5):363-370.
[90] TER HAAR ROMENY B M. Front-end Vision and Multi-scale Image Analysis:Multi-scale Computer Vision Theory and Applications, Written in Mathematica[M]. Dordrecht:Kluwer Academic Press, 2003.
[91] LINDEBERG T, TER HAAR ROMENY B M. Linear Scale-space I:Basic Theory[M]//TERHAAR R B M. Geometry-driven Diffusion in Computer Vision. Dordrecht:Springer, 1994.
[92] LI Zhilin, OPENSHAW S. A Natural Principle for the Objective Generalization of Digital Maps[J]. Cartography and Geographic Information Systems, 1993, 20(1):19-29.
[93] 张讲社, 徐宗本. 基于视觉系统的聚类:原理与算法[J]. 工程数学学报, 2000, 17(S1):14-20. ZHANG Jiangshe, XU Zongben. Clustering Method by Visual System Method[J]. Journal of Engineering Mathematics, 2000, 17(S1):14-20.
[94] WITKIN A P. Scale-space Filtering[C]//Proceedings of the Eighth International Joint Conference on Artificial Intelligence. Karlsruhe, West Germany:Morgan Kaufmann Publishers Inc., 1983:1019-1022.
[95] LIU Qiliang, LI Zhilin, DENG Min, et al. Modeling the Effect of Scale on Clustering of Spatial Points[J]. Computers, Environment and Urban Systems, 2015, 52:81-92.
[96] LI Z L. Algorithmic Foundation of Multi-scale Spatial Representation[M]. London:CRC Press, 2007.
[97] BENJAMINI Y, YEKUTIELI D. The Control of the False Discovery Rate in Multiple Testing Under Dependency[J]. The Annals of Statistics, 2001, 29(4):1165-1188.
[98] LIU Qiliang, TANG Jianbo, DENG Min, et al. An Iterative Detection and Removal Method for Detecting Spatial Clusters of Different Densities[J]. Transactions in GIS, 2015, 19(1):82-106.
[99] LIU Qiliang, DENG Min, SHI Yan, et al. A Density-based Spatial Clustering Algorithm Considering both Spatial Proximity and Attribute Similarity[J]. Computers & Geosciences, 2012, 46:296-309.
[100] 唐建波, 刘启亮, 邓敏, 等. 空间层次聚类显著性判别的重排检验方法[J]. 测绘学报, 2016, 45(2):233-240, 249. DOI:10.11947/j.AGCS.2016.20140605. TANG Jianbo, LIU Qiliang, DENG Min, et al. A Permutation Test for Identifying Significant Clusters in Spatial Dataset[J]. Acta Geodaetica et Cartographica Sinica, 2016, 45(2):233-240, 249. DOI:10.11947/j.AGCS.2016.20140605.
Outlines

/