测绘学报 ›› 2023, Vol. 52 ›› Issue (8): 1387-1397.doi: 10.11947/j.AGCS.2023.20210722

• 地图学与地理信息 • 上一篇    下一篇

地理实体与重叠空间关系联合抽取的改进CasRel模型法

蒋萌1,2, 杨春成1,3,4, 尚海滨1, 秦志龙1, 王泽凡3   

  1. 1. 中国地质大学(武汉)国家地理信息系统工程技术研究中心, 湖北 武汉 430074;
    2. 山东浪潮新基建科技有限公司, 山东 济南 250101;
    3. 中国地质大学(武汉)地质探测与评估教育部重点实验室, 湖北 武汉 430074;
    4. 中国地质大学(武汉)计算机学院, 湖北 武汉 430074
  • 收稿日期:2021-12-30 修回日期:2023-02-27 发布日期:2023-09-07
  • 通讯作者: 杨春成 E-mail:yangcc@cug.edu.cn
  • 作者简介:蒋萌(1996-),男,硕士生,研究方向为地图综合与空间数据挖掘。E-mail:jiangmgiser@cug.edu.cn
  • 基金资助:
    国家自然科学基金(42171438);地质探测与评估教育部重点实验室主任基金(GLAB2022ZR01);中央高校基本科研业务费

Improved CasRel model for joint extraction of geographic entity and overlapping space relation

JIANG Meng1,2, YANG Chuncheng1,3,4, SHANG Haibin1, QIN Zhilong1, WANG Zefan3   

  1. 1. National Engineering Research for Geographic Information System, China University of Geosciences, Wuhan 430074, China;
    2. Shandong Inspur New Infrastructure Technology Co., Ltd., Jinan 250101, China;
    3. Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan 430074, China;
    4. School of Computer Science, China University of Geosciences, Wuhan 430074, China
  • Received:2021-12-30 Revised:2023-02-27 Published:2023-09-07
  • Supported by:
    The National Natural Science Foundation of China (No. 42171438); The Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No. GLAB2022ZR01); The Fundamental Research Funds for the Central Universities

摘要: 地理空间文本中蕴含丰富的位置信息,这些信息对地理实体的定位提供重要支撑,地理实体和空间关系抽取是位置信息获取的关键。针对地理空间关系语料库构建问题,本文从《中国大百科全书地理分册》中以包含空间关系的句子为单位,通过标注句中空间关系,完成地理空间关系语料库的构建。针对管道式(pipeline)关系抽取模型忽略了地理实体与空间关系之间的相关性问题,本文采用融合ERNIE(enhanced representation through knowledge integration)预训练模型和BAB(BiLSTM+self-attention mechanism+BiLSTM)模块改进CasRel模型实现地理实体与空间关系的联合抽取,并通过级联标注的方式解决地理空间文本中重叠空间关系的抽取。试验表明,在DuIE数据集和本文构建的地理空间关系语料库上,相较于CasRel联合抽取模型,本文模型的F1值分别提升了4.81%和1.97%,并有效地提升重叠空间关系的抽取效果。

关键词: 空间关系抽取, ERNIE, 地理实体, 地理空间关系语料库, 重叠关系

Abstract: Geospatial text contains rich location information, which provides important support for the location of geographic entities. The extraction of geographic entities and spatial relationships is the key to obtaining location information. Aiming at the construction of the geospatial relation corpus, we take the sentences containing the spatial relation as the unit from the Encyclopedia of China Geography, and complete the construction of the geospatial relation corpus by marking the spatial relation in the sentence. For the pipeline relation extraction model which ignores the correlation between geographic entities and spatial relations, we use enhanced representation through knowledge integration (ERNIE) and BiLSTM+self-attention mechanism+BiLSTM (BAB) layers to improve the CasRel model to achieve joint extraction of geographic entities and spatial relationships, and solve the extraction of overlapping spatial relationships in geospatial texts by cascading annotation. Experiments show that on the DuIE dataset and our constructed geospatial corpus, compared with the CasRel joint extraction model, the F1 value of our model is increased by 4.81% and 1.97%, respectively, and the extraction effect of overlapping spatial relationships is effectively improved.

Key words: spatial relation extraction, ERNIE, geographic entities, corpus of geospatial relations, relationship overlap

中图分类号: