Acta Geodaetica et Cartographica Sinica

• 学术论文 • Previous Articles     Next Articles

Annotation of Geographical Named Entities in Chinese Text

  

  • Received:2011-04-13 Revised:2011-06-03 Online:2012-02-25 Published:2012-02-25

Abstract: Semantic interpretation of geographic information in natural language text can help people more in-depth understand the mechanism of geospatial cognition and spatial language, and enhance the intelligence of spatial query in geographic information systems (GIS), spatial reasoning, and geographical information retrieval etc. Corpus annotation is the task of analyzing specific language information, linguistic structure of domain information found in the text, and the establishment of the metadata describing them. Firstly, this paper analyzes the difference of representation of geographical entities in Chinese text and GIS. Secondly, based on the description of linguistic characteristics of geographical named entities in Chinese text, an annotation scheme is presented and the annotation specification is given in detail. Finally, GATE(General Architecture for Text Engineering)is introduced as a annotation platform, and a large-scale annotated corpus (i.e. GeoCorpus) based on "Encyclopedia of China Geography" (2,130,000 bytes of Chinese text) is established and evaluated. This study effectively addresses the current lack of related standardization and standardized data. The further work will focus on the following work: 1) Establishing a general corpus annotation based on web pages to resolve the imbalance of GeoCorpus; 2) Developing a visual annotation tool integrating GIS database with GATE to further improve annotation performance; 3) Annotation of spatial relations in Chinese text based on the theory of spatial semantic rules.