测绘学报 ›› 2017, Vol. 46 ›› Issue (4): 516-525.doi: 10.11947/j.AGCS.2017.20160377

• 地图学与地理信息 • 上一篇    下一篇

签到位置数据的密度峰值快速搜索与聚类方法

刘萌, 邬群勇, 邱端昇, 孙梅, 张强   

  1. 福州大学福建省空间信息工程研究中心空间数据挖掘与信息共享教育部重点实验室, 福建 福州 350002
  • 收稿日期:2016-07-25 修回日期:2017-03-20 出版日期:2017-04-20 发布日期:2017-05-05
  • 通讯作者: 邬群勇 E-mail:qywu.wu@fzu.edu.cn
  • 作者简介:刘萌(1990-),男,硕士生,主要研究方向为空间数据挖掘与可视化。
  • 基金资助:
    国家自然科学基金(41471333)

Fast Search the Density Peaks and Clustering Method for Check-in Data

LIU Meng, WU Qunyong, QIU Duansheng, SUN Mei, ZHANG Qiang   

  1. Spatial Information Research Center of Fujian, Fuzhou University, Key Laboratory of Spatial Data Mining & Information Sharing of MOE, Fuzhou 350002, China
  • Received:2016-07-25 Revised:2017-03-20 Online:2017-04-20 Published:2017-05-05

摘要: 位置签到数据蕴含了城市居民活动变化。由于客户端位置候选问题,不同的签到行为以同一候选位置签到时会产生位置重复现象。针对现有密度聚类方法在签到数据聚类上存在的问题,以快速搜索和查找密度峰值聚类算法(CFSFDP)为基础,提出了签到位置数据的密度峰值快速搜索与聚类方法。首先,引入位置重复频率来表达签到位置重复,然后,对原始签到位置数据点统计位置重复频率并重新设计数据结构,以新的空间点要素为研究对象寻找密度峰值点;最后,构建了峰值点密度簇聚类算法,在点要素集聚类过程中考虑密度连通性来保证峰值密度簇的连续与完整。试验表明,所提出的聚类方法有效避免了重复度较高的离群位置对象选为峰值并聚类的情况,并具有良好的空间适应性。所提取的密度峰值点不仅可以用来表示热区的中心,还能够反映热区的集中趋势,进而可以帮助探索热区的动态变化情况。

关键词: 签到位置数据, 活动热区, 空间聚类, 密度峰值聚类, 位置重复频率

Abstract: Check-in data obtained from Location-based Social Network (LBSN) is a sort of crowd geographic data which will reveal daily activities of urban residents. Different check-in behaviors with the same check-in location will produce the phenomenon of location duplication because of location candidate function in LBSN system. The current density-based spatial clustering algorithms have the following problems: ①difficulty to find density peak point. ②clustering error caused by check-in point objects with duplicate positions. In order to solve these problems, we proposed a fast search density peaks and clustering method for check-in data, based on clustering by fast search and find of density peaks (CFSFDP). Firstly, position repetition frequency was introduced and calculated to illustrate the number of the check-in position duplications data. Secondly, a new type of point feature was constructed by adding position repetition frequency of the original check-in position data, which was used as study object to search density peaks. At last, clustering algorithm based on density peak point was constructed in which density connectivity was taken into account to ensure the continuity and integrity of density clusters. Taking check-in data obtained from Sina Microblog as an example, an experiment was designed and implemented. The results demonstrates:①Clustering method can effectively avoid the problem that the outlier location object with high repeatability is chosen as the peak and clustering, and has excellent spatial adaptability as well when comparing with check-in data from other area. ②Extracted density peak points can not only be used to represent the center of the hot zone, but also reflect the concentration trend of the hot zone, which can help to explore the dynamic change of the hot zone.

Key words: check-in data, hot zone, spatial clustering, density peaks clustering, position repetition frequency

中图分类号: