地图学与地理信息

开放式地理实体关系抽取的Bootstrapping方法

  • 余丽 ,
  • 陆锋 ,
  • 刘希亮
展开
  • 1. 中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室, 北京 100101;
    2. 中国科学院大学, 北京 100101;
    3. 江苏省地理信息资源开发与利用协同创新中心, 江苏 南京 210023
余丽(1986-),女,博士生,研究方向为互联网空间信息搜索。E-mail: yul@lreis.ac.cn

收稿日期: 2015-04-07

  修回日期: 2016-02-02

  网络出版日期: 2016-05-30

基金资助

国家自然科学基金(41271408);国家863计划(2013AA120305)

A Bootstrapping Based Approach for Open Geo-entity Relation Extraction

  • YU Li ,
  • LU Feng ,
  • LIU Xiliang
Expand
  • 1. State Key Lab of Resources and Environmental Information System, The Institute of Geographic Sciences and Natural Resources Research, Beijing 100101, China;
    2. University of Chinese Academy of Sciences, Beijing 100101, China;
    3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, ChinaAbstract

Received date: 2015-04-07

  Revised date: 2016-02-02

  Online published: 2016-05-30

Supported by

The National Natural Science Foundation of China (No.41271408);The National High-Tech Research and Development Program of China (863 Program) (No.2013AA120305)

摘要

从网络文本中抽取地理实体间空间关系和语义关系要求高时效性和强鲁棒性。本文提出一种开放式地理实体关系的自动抽取方法,通过bootstrapping技术统计词语的词性、位置和距离特征来计算语境中词语权值,据此确定描述地理实体关系的关键词,最终组织成结构化实例,并使用百度百科和Stanford CoreNLP开展了试验。研究结果表明,本文方法能自动挖掘自然语言的部分词法特征,无须领域专家知识和大规模标注语料,适用于未知关系类型的信息抽取任务;较之经典的Frequency、TF-IDF和PPMI频率统计方法,精度和召回率分别提升约5%和23%。

本文引用格式

余丽 , 陆锋 , 刘希亮 . 开放式地理实体关系抽取的Bootstrapping方法[J]. 测绘学报, 2016 , 45(5) : 616 -622 . DOI: 10.11947/j.AGCS.2016.20150181

Abstract

Extracting spatial relations and semantic relations between two geo-entities from Web texts, asks robust and effective solutions. This paper puts forward a novel approach: firstly, the characteristics of terms (part-of-speech, position and distance) are analyzed by means of bootstrapping. Secondly, the weight of each term is calculated and the keyword is picked out as the clue of geo-entity relations. Thirdly, the geo-entity pairs and their keywords are organized into structured information. Finally, an experiment is conducted with Baidubaike and Stanford CoreNLP. The study shows that the presented method can automatically explore part of the lexical features and find additional relational terms which neither the domain expert knowledge nor large scale corpora need. Moreover, compared with three classical frequency statistics methods, namely Frequency, TF-IDF and PPMI, the precision and recall are improved about 5% and 23% respectively.

参考文献

[1] 陆锋, 张恒才. 大数据与广义GIS[J]. 武汉大学学报(信息科学版), 2014, 39(6): 645-654. LU Feng, ZHANG Hengcai. Big Data and Generalized GIS[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 645-654.
[2] 刘纪平, 栗斌, 石丽红, 等. 一种本体驱动的地理空间事件相关信息自动检索方法[J]. 测绘学报, 2011, 40(4): 502-508. LIU Jiping, LI Bin, SHI Lihong, et al. An Automated Retrieval Method of Geo-spatial Event Information Based on Ontology[J]. Acta Geodaetica et Cartographica Sinica, 2011, 40(4): 502-508.
[3] 张春菊. 面向中文文本的事件时空与属性信息解析方法研究[J]. 测绘学报, 2015, 44(5): 590. DOI: 10.11947/j.AGCS.2015.20140657. ZHANG Chunju. Interpretation of Event Spatio-temporal and Attribute Information in Chinese Text[J]. Acta Geodaetica et Cartographica Sinica, 2015, 44(5): 590. DOI: 10.11947/j.AGCS.2015.20140657.
[4] 张恒才, 陆锋, 陈洁. 微博客蕴含交通信息的提取[J]. 中国图象图形学报, 2013, 18(1): 123-129. ZHANG Hengcai, LU Feng, CHEN Jie. Extracting Traffic Information from Massive Micro-blog Messages[J]. Journal of Image and Graphics, 2013, 18(1): 123-129.
[5] JONES C B, PURVES R S, CLOUGH P D, et al. Modelling Vague Places with Knowledge from the Web[J]. International Journal of Geographical Information Science, 2008, 22(10): 1045-1065.
[6] JONES C B, PURVES R S. Geographical Information Retrieval[J]. International Journal of Geographical Information Science, 2008, 22(3): 219-228.
[7] 赵军, 刘康, 周光有, 等. 开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110. ZHAO Jun, LIU Kang, ZHOU Guangyou, et al. Open Information Extraction[J]. Journal of Chinese Information Processing, 2011, 25(6): 98-110.
[8] 杨博, 蔡东风, 杨华. 开放式信息抽取研究进展[J]. 中文信息学报, 2014, 28(4): 1-11, 36. YANG Bo, CAI Dongfeng, YANG Hua. Progress in Open Information Extraction[J]. Journal of Chinese Information Processing, 2014, 28(4): 1-11, 36.
[9] 张雪英, 张春菊, 朱少楠. 中文文本的地理空间关系标注[J]. 测绘学报, 2012, 41(3): 468-474. ZHANG Xueying, ZHANG Chunju, ZHU Shaonan. Annotation for Geographical Spatial Relations in Chinese Text[J]. Acta Geodaetica et Cartographica Sinica, 2012, 41(3): 468-474.
[10] SCHOCKAERT S, SMART P D, ABDELMOTY A I, et al. Mining Topological Relations from the Web[C]//Proceedings of the 19th International Workshop on Database and Expert Systems Application. Turin: IEEE, 2008: 652-656.
[11] CAO Cungen, WANG Shi, JIANG Lin. A Practical Approach to Extracting Names of Geographical Entities and Their Relations from the Web[C]//Proceedings of the 7th International Conference on Knowledge Science, Engineering and Management. Switzerland: Springer, 2014: 210-221.
[12] ELIA A, GUGLIELMO D, MAISTO A, et al. A Linguistic-based Method for Automatically Extracting Spatial Relations from Large Non-structured Data[C]//Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing. Switzerland: Springer, 2013: 193-200.
[13] ZHU Shaonan, ZHANG Xueying, ZHANG Chunju. Syntactic Pattern Recognition of Geospatial Relations Described in Natural Language[C]//Proceedings of the 2010 International Conference on Broadcast Technology and Multimedia Communication. New York: IEEE, 2010: 354-357.
[14] WALLGRüN J O, KLIPPEL A, BALDWIN T. Building a Corpus of Spatial Relational Expressions Extracted from Web Documents[C]//Proceedings of the 8th Workshop on Geographic Information Retrieval. New York: ACM, 2014.
[15] BLESSING A, SCHVTZE H. Fine-grained Geographical Relation Extraction from Wikipedia[C]//Proceedings of the 7th International Conference on Language Resources and Evaluation. Valletta: LREC, 2010.
[16] LOGLISCI C, IENCO D, ROCHE M, et al. Toward Geographic Information Harvesting: Extraction of Spatial Relational Facts from Web Documents[C]//Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops. Brussels: IEEE, 2012: 789-796.
[17] MORO A, NAVIGLI R. Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing:[s.n.], 2013: 2148-2154.
[18] LIU Zhiyuan, CHEN Xinxiong, ZHENG Yabin, et al. Automatic Keyphrase Extraction by Bridging Vocabulary Gap[C]//Proceedings of the 15th Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2011: 135-144.
[19] ABNEY S P. Bootstrapping[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002: 360-367.
[20] 邓敏, 徐锐, 李志林, 等. 空间查询中自然语言空间关系与度量空间关系的转换方法研究: 以面目标为例[J]. 测绘学报, 2009, 38(6): 527-531. DENG Min, XU Rui, LI Zhilin, et al. A Spatial-query-driven Transformation between Metric Spatial Relations and Natural Language Spatial Relations: Taking Regions as Example[J]. Acta Geodaetica et Cartographica Sinica, 2009, 38(6): 527-531.
文章导航

/