Acta Geodaetica et Cartographica Sinica ›› 2017, Vol. 46 ›› Issue (4): 516-525.doi: 10.11947/j.AGCS.2017.20160377

Previous Articles     Next Articles

Fast Search the Density Peaks and Clustering Method for Check-in Data

LIU Meng, WU Qunyong, QIU Duansheng, SUN Mei, ZHANG Qiang   

  1. Spatial Information Research Center of Fujian, Fuzhou University, Key Laboratory of Spatial Data Mining & Information Sharing of MOE, Fuzhou 350002, China
  • Received:2016-07-25 Revised:2017-03-20 Online:2017-04-20 Published:2017-05-05

Abstract: Check-in data obtained from Location-based Social Network (LBSN) is a sort of crowd geographic data which will reveal daily activities of urban residents. Different check-in behaviors with the same check-in location will produce the phenomenon of location duplication because of location candidate function in LBSN system. The current density-based spatial clustering algorithms have the following problems: ①difficulty to find density peak point. ②clustering error caused by check-in point objects with duplicate positions. In order to solve these problems, we proposed a fast search density peaks and clustering method for check-in data, based on clustering by fast search and find of density peaks (CFSFDP). Firstly, position repetition frequency was introduced and calculated to illustrate the number of the check-in position duplications data. Secondly, a new type of point feature was constructed by adding position repetition frequency of the original check-in position data, which was used as study object to search density peaks. At last, clustering algorithm based on density peak point was constructed in which density connectivity was taken into account to ensure the continuity and integrity of density clusters. Taking check-in data obtained from Sina Microblog as an example, an experiment was designed and implemented. The results demonstrates:①Clustering method can effectively avoid the problem that the outlier location object with high repeatability is chosen as the peak and clustering, and has excellent spatial adaptability as well when comparing with check-in data from other area. ②Extracted density peak points can not only be used to represent the center of the hot zone, but also reflect the concentration trend of the hot zone, which can help to explore the dynamic change of the hot zone.

Key words: check-in data, hot zone, spatial clustering, density peaks clustering, position repetition frequency

CLC Number: