地图学与地理信息

签到位置数据的密度峰值快速搜索与聚类方法

  • 刘萌 ,
  • 邬群勇 ,
  • 邱端昇 ,
  • 孙梅 ,
  • 张强
展开
  • 福州大学福建省空间信息工程研究中心空间数据挖掘与信息共享教育部重点实验室, 福建 福州 350002
刘萌(1990-),男,硕士生,主要研究方向为空间数据挖掘与可视化。

收稿日期: 2016-07-25

  修回日期: 2017-03-20

  网络出版日期: 2017-05-05

基金资助

国家自然科学基金(41471333)

Fast Search the Density Peaks and Clustering Method for Check-in Data

  • LIU Meng ,
  • WU Qunyong ,
  • QIU Duansheng ,
  • SUN Mei ,
  • ZHANG Qiang
Expand
  • Spatial Information Research Center of Fujian, Fuzhou University, Key Laboratory of Spatial Data Mining & Information Sharing of MOE, Fuzhou 350002, China

Received date: 2016-07-25

  Revised date: 2017-03-20

  Online published: 2017-05-05

摘要

位置签到数据蕴含了城市居民活动变化。由于客户端位置候选问题,不同的签到行为以同一候选位置签到时会产生位置重复现象。针对现有密度聚类方法在签到数据聚类上存在的问题,以快速搜索和查找密度峰值聚类算法(CFSFDP)为基础,提出了签到位置数据的密度峰值快速搜索与聚类方法。首先,引入位置重复频率来表达签到位置重复,然后,对原始签到位置数据点统计位置重复频率并重新设计数据结构,以新的空间点要素为研究对象寻找密度峰值点;最后,构建了峰值点密度簇聚类算法,在点要素集聚类过程中考虑密度连通性来保证峰值密度簇的连续与完整。试验表明,所提出的聚类方法有效避免了重复度较高的离群位置对象选为峰值并聚类的情况,并具有良好的空间适应性。所提取的密度峰值点不仅可以用来表示热区的中心,还能够反映热区的集中趋势,进而可以帮助探索热区的动态变化情况。

本文引用格式

刘萌 , 邬群勇 , 邱端昇 , 孙梅 , 张强 . 签到位置数据的密度峰值快速搜索与聚类方法[J]. 测绘学报, 2017 , 46(4) : 516 -525 . DOI: 10.11947/j.AGCS.2017.20160377

Abstract

Check-in data obtained from Location-based Social Network (LBSN) is a sort of crowd geographic data which will reveal daily activities of urban residents. Different check-in behaviors with the same check-in location will produce the phenomenon of location duplication because of location candidate function in LBSN system. The current density-based spatial clustering algorithms have the following problems: ①difficulty to find density peak point. ②clustering error caused by check-in point objects with duplicate positions. In order to solve these problems, we proposed a fast search density peaks and clustering method for check-in data, based on clustering by fast search and find of density peaks (CFSFDP). Firstly, position repetition frequency was introduced and calculated to illustrate the number of the check-in position duplications data. Secondly, a new type of point feature was constructed by adding position repetition frequency of the original check-in position data, which was used as study object to search density peaks. At last, clustering algorithm based on density peak point was constructed in which density connectivity was taken into account to ensure the continuity and integrity of density clusters. Taking check-in data obtained from Sina Microblog as an example, an experiment was designed and implemented. The results demonstrates:①Clustering method can effectively avoid the problem that the outlier location object with high repeatability is chosen as the peak and clustering, and has excellent spatial adaptability as well when comparing with check-in data from other area. ②Extracted density peak points can not only be used to represent the center of the hot zone, but also reflect the concentration trend of the hot zone, which can help to explore the dynamic change of the hot zone.

参考文献

[1] 张子昂, 黄震方, 靳诚,等.基于微博签到数据的景区旅游活动时空行为特征研究——以南京钟山风景名胜区为例[J]. 地理与地理信息科学, 2015, 31(4): 121-126. ZHANG Ziang, HUANG Zhenfang, JIN Cheng, et al. Research on Spatial-Temporal Characteristics of Scenic Tourist Activity Based on Sina Microblog: a Case Study of Nanjing Zhongshan Mountain National Park[J]. Geography and Geo-Information Science, 2015, 31(4): 121-126.
[2] LIU Yu, SUI Zhengwei, KANG Chaogui, et al. Uncovering Patterns of Inter-urban Trip and Spatial Interaction from Social Media Check-in Data[J]. PLoS One, 2014, 9(1): e86026.
[3] LONG Xuelian, JIN Lei, JOSHI J. Exploring Trajectory-Driven Local Geographic Topics in Foursquare[C]//Proceedings of the 2012 ACM Conference on Ubiquitous Computing. New York, NY: ACM, 2012: 927-934.
[4] 陆锋, 刘康, 陈洁. 大数据时代的人类移动性研究[J]. 地球信息科学学报, 2014, 16(5): 665-672. LU Feng, LIU Kang, CHEN Jie. Research on Human Mobility in Big Data Era[J]. Journal of Geo-Information Science, 2014, 16(5): 665-672.
[5] 胡庆武, 王明, 李清泉. 利用位置签到数据探索城市热点与商圈[J]. 测绘学报, 2014, 43(3): 314-321. DOI: 10.13485/j.cnki.11-2089.2014.0045. HU Qingwu, WANG Ming, LI Qingquan. Urban Hotspot and Commercial Area Exploration with Check-in Data[J]. Acta Geodaetica et Cartographica Sinica, 2014, 43(3): 314-321. DOI: 10.13485/j.cnki.11-2089.2014.0045.
[6] 申悦, 柴彦威. 基于GPS数据的北京市郊区巨型社区居民日常活动空间[J]. 地理学报, 2013, 68(4): 506-516. SHEN Yue, CHAI Yanwei. Daily Activity Space of Suburban Mega-community Residents in Beijing Based on GPS Data[J]. Acta Geographica Sinica, 2013, 68(4): 506-516.
[7] 禹文豪, 艾廷华, 刘鹏程, 等. 设施POI分布热点分析的网络核密度估计方法[J]. 测绘学报, 2015, 44(12): 1378-1383. DOI: 10.11947/j.AGCS.2015.20140538. YU Wenhao,AI Tinghua,LIU Pengcheng, et al. Network Kernel Density Estimation for the Analysis of Facility POI Hotspots[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(12): 1378-1383. DOI: 10.11947/j.AGCS.2015.20140538.
[8] 王波, 甄峰, 张浩. 基于签到数据的城市活动时空间动态变化及区划研究[J]. 地理科学, 2015, 35(2): 151-160. WANG Bo, ZHEN Feng, ZHANG Hao. The Dynamic Changes of Urban Space-time Activity and Activity Zoning Based on Check-in Data in Sina Web[J]. Scientia Geographica Sinica, 2015, 35(2): 151-160.
[9] 刘启亮, 李光强, 邓敏. 一种基于局部分布的空间聚类算法[J]. 武汉大学学报(信息科学版), 2010, 35(3): 373-377. LIU Qiliang, LI Guangqiang, DENG Min. A Local Distribution Based Spatial Clustering Algorithm[J]. Geomatics and Information Science of Wuhan University, 2010, 35(3): 373-377.
[10] 曾绍琴, 李光强, 廖志强. 空间聚类方法的分类[J]. 测绘科学, 2012, 37(5): 103-106. ZENG Shaoqin, LI Guangqiang, LIAO Zhiqiang. A New Category of Spatial Clustering Methods[J]. Science of Surveying and Mapping, 2012, 37(5): 103-106.
[11] 柳盛, 吉根林. 空间聚类技术研究综述[J]. 南京师范大学学报(工程技术版), 2010, 10(2): 57-62. LIU Sheng, JI Genlin. A Review of Researches on Spatial Clustering[J]. Journal of Nanjing Normal University (Engineering and Technology Edition), 2010, 10(2): 57-62.
[12] NASIBOV E N, ULUTAGAY G. Robustness of Density-based Clustering Methods with Various Neighborhood Relations[J]. Fuzzy Sets and Systems, 2009, 160(24): 3601-3615.
[13] BIRANT D, KUT A. ST-DBSCAN: an Algorithm for Clustering Spatial-temporal Data[J]. Data & Knowledge Engineering, 2007, 60(1): 208-221.
[14] LU Min, WANG Zuchao, LIANG Jie, et al. OD-Wheel: Visual Design to Explore OD Patterns of a Central Region[C]//Proceedings of 2015 IEEE Pacific Visualization Symposium. Hangzhou: IEEE, 2015: 87-91.
[15] PAN Gang, QI Guande, WU Zhaohui, et al. Land-use Classification Using Taxi GPS Traces[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(1): 113-123.
[16] 李光强, 邓敏, 刘启亮, 等. 一种适应局部密度变化的空间聚类方法[J]. 测绘学报, 2009, 38(3): 255-263. DOI: 10.3321/j.issn:1001-1595.2009.03.011. LI Guangqiang, DENG Min, LIU Qiliang, et al. A Spatial Clustering Method Adaptive to Local Density Change[J]. Acta Geodaetica et Cartographica Sinica, 2009, 38(3): 255-263. DOI: 10.3321/j.issn:1001-1595.2009.03.011.
[17] 周素红, 郝新华, 柳林. 多中心化下的城市商业中心空间吸引衰减率验证——深圳市浮动车GPS时空数据挖掘[J]. 地理学报, 2014, 69(12): 1810-1820. ZHOU Suhong, HAO Xinhua, LIU Lin. Validation of Spatial Decay Law Caused by Urban Commercial Center's Mutual Attraction in Polycentric City: Spatio-temporal Data Mining of Floating Cars' GPS Data in Shenzhen[J]. Acta Geographica Sinica, 2014, 69(12): 1810-1820.
[18] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
[19] 马春来, 单洪, 马涛, 等. 一种基于CFSFDP改进算法的重要地点识别方法研究[J]. 计算机应用研究, 2017, 34(1): 136-140. MA Chunlai, SHAN Hong, MA Tao, et al. Research on Important Places Identification Method Based on Improved CFSFDP Algorithm[J]. Application Research of Computers, 2017, 34(1): 136-140.
[20] LIU Peiyu, LIU Yingying, HOU Xiuyan, et al. A Text Clustering Algorithm Based on Find of Density Peaks[C]//Proceedings of the 7th International Conference on Information Technology in Medicine and Education. Huangshan: IEEE, 2015: 348-352.
[21] XIE Juanying,GAO Hongchao,XIE Weixin, et al. Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-nearest Neighbors[J]. Information Sciences, 2016, 354: 19-40.
[22] DU Mingjing,DING Shifei,JIA Hongjie. Study on Density Peaks Clustering Based on k-Nearest Neighbors and Principal Component Analysis[J]. Knowledge-Based Systems, 2016, 99: 135-145.
[23] 谢娟英, 屈亚楠. 密度峰值优化初始中心的K-medoids聚类算法[J]. 计算机科学与探索, 2016, 10(2): 230-247. XIE Juanying, QU Yanan. K-medoids Clustering Algorithms with Optimized Initial Seeds by Density Peaks[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(2): 230-247.
[24] LIU Dongchang,CHENG Shifeng,YANG Yiping. Density Peaks Clustering Approach for Discovering Demand Hot Spots in City-scale Taxi Fleet Dataset[C]//Proceedings of the 18th International Conference on Intelligent Transportation Systems. Las Palmas: IEEE, 2015.
[25] 马春来, 单洪, 马涛. 一种基于簇中心点自动选择策略的密度峰值聚类算法[J]. 计算机科学, 2016, 43(7): 255-258, 280. MA Chunlai, SHAN Hong, MA Tao. Improved Density Peaks Based Clustering Algorithm with Strategy Choosing Cluster Center Automatically[J]. Computer Science, 2016, 43(7): 255-258, 280.
[26] 沈忱, 祁昆仑, 刘文轩, 等. 基于FSFDP-BoV模型的遥感影像检索[J]. 地理与地理信息科学, 2016, 32(1): 55-59. SHEN Chen, QI Kunlun, LIU Wenxuan, et al. Remote Sensing Image Retrieval Research Based on FSFDP-BoV Model[J]. Geography and Geo-Information Science, 2016, 32(1): 55-59.
[27] 蒋礼青, 张明新, 郑金龙, 等. 快速搜索与发现密度峰值聚类算法的优化研究[J]. 计算机应用研究, 2016, 33(11): 3251-3254. JIANG Liqing, ZHANG Mingxin, ZHENG Jinlong, et al. Optimization of Clustering by Fast Search and Find of Density Peaks[J]. Application Research of Computers, 2016, 33(11): 3251-3254.
文章导航

/