针对互联网POI(兴趣点)地址信息中广泛存在的地址要素不完整、文字表达不一致等不规范现象,提出一种顾及位置关系的网络POI地址信息标准化处理方法,首先对POI信息进行切分提取并逐层匹配地址树模型;然后基于4种位置关系从标准POI库中选出相应集合,作为丰富和修正非标准POI地址要素的候选;最后通过最小粒度地址要素的回溯,实现POI地址信息的快速标准化处理。试验表明该方法可以获得较高的准确率,尤其适用于在互联网数据环境中的POI地址信息标准化。
As points of interest (POI)on the internet, exists widely incomplete addresses and inconsistent literal expressions, a fast standardization processing method of network POIs address information based on spatial constraints was proposed. Based on the model of the extensible address expression, first of all, address information of POI was segmented and extracted. Address elements are updated by means of matching with the address tree layer by layer. Then, by defining four types of positional relations, corresponding set are selected from standard POI library as candidate for enrichment and amendment of non-standard address. At last, the fast standardized processing of POI address information was achieved with the help of backtracking address elements with minimum granularity. Experiments in this paper proved that the standardization processing of an address can be realized by means of this method with higher accuracy in order to build the address database.
[1] GOLDBERG D W, WILSON J P, KNOBLOCK C A. From Text to Geographic Coordinates: The Current State of Geocoding[J].URISA Journal, 2007,19(1): 33-46.
[2] 黄颂. 中文地址编码技术的研究[D]. 北京: 北京大学, 2005. HUANG Song.Research on Chinese Address Coding Technology[D]. Beijing: Beijing University, 2005.
[3] 陈细谦, 迟忠先, 金妮. 城市地理编码系统应用与研究[J]. 计算机工程, 2004, 30(23): 50-52. CHEN Xiqian, CHI Zhongxian, JIN Ni. Application and Study of City Geocoding System[J]. Computer Engineering, 2004, 30(23): 50-52.
[4] 江洲, 李琦, 王凌云. 空间信息融合与地理编码数据库的开发[J]. 计算机工程, 2004, 30(5): 1-2, 153. JIANG Zhou, LI Qi, WANG Lingyun. Geospatial Information Fusion and Implementation of Geocoding Database[J]. Computer Engineering, 2004, 30(5): 1-2, 153.
[5] 李琦, 罗志清, 郝力, 等. 基于不规则网格的城市管理网格体系与地理编码[J]. 武汉大学学报(信息科学版), 2005, 30(5): 408-411. LI Qi, LUO Zhiqing, HAO Li, et al. Research on Urban Grid System and Geocodes[J]. Geomatics and Information Science of Wuhan University, 2005, 30(5): 408-411.
[6] 程承旗, 关丽. 基于地图分幅拓展的全球剖分模型及其地址编码研究[J]. 测绘学报, 2010, 39(3): 295-302. CHENG Chengqi, GUAN Li.The Global Subdivision Grid Based on Extended Mapping Division and Its Address Coding[J]. Acta Geodaetica et Cartographica Sinica, 2010, 39(3): 295-302.
[7] ZANDBERGEN P A. A Comparison of Address Point, Parcel and Street Geocoding Techniques[J]. Computers, Environment and Urban Systems, 2008, 32(3): 214-232.
[8] 薛明, 肖学年. 关于地理编码几个问题的思考[J]. 北京测绘, 2007(2): 54-56. XUE Ming, XIAO Xuenian. Considering on Some Questions of Geocoding[J]. Beijing Surveying and Mapping, 2007(2): 54-56.
[9] 章意锋, 吴健平, 程怡, 等. ArcGIS中地理编码方法的改进[J]. 测绘与空间地理信息, 2007, 30(3): 116-119. ZHANG Yifeng, WU Jianping, CHENG Yi, et al. The Improvement of Geocoding in ArcGIS[J]. Geomatics & Spatial Information Technology, 2007, 30(3): 116-119.
[10] 朱前飞. MapInfo中的地理编码及应用[J]. 四川测绘, 2001, 24(3): 117-119. ZHU Qianfei.Geocode and Its Application in MapInfo[J]. Surveying and Mapping of Sichuan, 2001, 24(3): 117-119.
[11] GU Bin, JIN Yanfeng, ZHANG Chang. Study on the Standardized Method of Chinese Addresses Based on Expert System[C]//Proceedings of the IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS). Hangzhou: IEEE, 2012: 1254-1258.
[12] KOTHARI G, FARUQUIE T A, SUBRAMANIAM L V, et al. Transfer of Supervision for Improved Address Standardization[C]//Proceedings of the 20th International Conference on Pattern Recognition (ICPR).Istanbul:IEEE, 2010: 2178-2181.
[13] CHEN Liyan, FANG Yuan. The Design and Research of Standard Address Database System Based on WebGIS in Panyu, Guangzhou[C]//Proceedings of 2008 International Seminar on Business and Information Management.Wuhan:IEEE, 2008: 233-235.
[14] AUTHORITY T V. Address Data Content Standard Public Review Draft[S].[S.l.]: Subcommittee on Cultural and Demographic Data, Federal Geographic Data Committee, 2003.
[15] 高红, 黄德根, 杨元生. 汉语自动分词中中文地名识别[J]. 大连理工大学学报, 2006, 46(4): 576-581. GAO Hong, HUANG Degen, YANG Yuansheng. Chinese Place Names Recognition for Chinese Automatic Segmentation[J]. Journal of Dalian University of Technology, 2006, 46(4): 576-581.
[16] 张春菊, 张雪英, 吉蕾静, 等. 地名通名与地理要素类型的关系映射[J]. 武汉大学学报(信息科学版), 2011, 36(7): 857-861. ZHANG Chunju, ZHANG Xueying, JI Leijing, et al. Relation Mapping between Generic Terms of Place Names and Geographical Feature Types[J]. Geomatics and Information Science of Wuhan University, 2011, 36(7): 857-861.
[17] 唐旭日, 陈小荷, 张雪英. 中文文本的地名解析方法研究[J]. 武汉大学学报(信息科学版), 2010, 35(8): 930-935, 982. TANG Xuri, CHEN Xiaohe, ZHANG Xueying. Research on Toponym Resolution in Chinese Text[J]. Geomatics and Information Science of Wuhan University, 2010, 35(8): 930-935, 982.
[18] BOURLAND F J, WALDEN S C, BAKER C A. Rich Browser-based Interface for Address Standardization and Geocoding: US, 20080065605[P]. 2008-03-13.
[19] MASREK M N, RAZAK Z A. Malaysian Address Semantic: The Process of Standardization[C]//Proceedings of the 2nd International Conference on Computer Research and Development. Kuala Lumpur:IEEE, 2010: 77-80.
[20] KALEEM A, GHORI K M, KHANZADA Z, et al. Address Standardization Using Supervised Machine Learning[C]Proceedings of 2011 International Conference on Computer Communication and Management. Singapore: IACSIT Press, 2011, 5: 441-445.
[21] 亢孟军, 杜清运, 王明军. 地址树模型的中文地址提取方法[J]. 测绘学报, 2015, 44(1): 99-107.DOI: 10.11947/j.AGCS.2015.20130205. KANG Mengjun, DU Qingyun, WANG Mingjun. A New Method of Chinese Address Extraction Based on Address Tree Model[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(1): 99-107. DOI: 10.11947/j.AGCS.2015.20130205.
[22] GUO Honglei, ZHU Huijia, GUO Zhili, et al. Address Standardization with Latent Semantic Association[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. NewYork:ACM, 2009: 1155-1164.