测绘学报 ›› 2017, Vol. 46 ›› Issue (1): 123-129.doi: 10.11947/j.AGCS.2017.20150470

• 地图学与地理信息 • 上一篇    下一篇

一种基于半监督学习的地理加权回归方法

赵阳阳1,2, 刘纪平1,2, 徐胜华2, 张福浩2, 杨毅2   

  1. 1. 辽宁工程技术大学测绘与地理科学学院, 辽宁 阜新 123000;
    2. 中国测绘科学研究院政府地理信息系统研究中心, 北京 100830
  • 收稿日期:2015-09-14 修回日期:2016-07-11 出版日期:2017-01-20 发布日期:2017-02-06
  • 通讯作者: 刘纪平 E-mail:liujp@casm.ac.cn
  • 作者简介:赵阳阳(1987-),女,博士生,从事政府地理信息服务、空间分析方向研究。E-mail:402862381@qq.com
  • 基金资助:
    测绘地理信息公益性行业科研专项(201512032);国家重点研发计划(2016YFC0803101)

A Geographic Weighted Regression Method Based on Semi-supervised Learning

ZHAO Yangyang1,2, LIU Jiping1,2, XU Shenghua2, ZHANG Fuhao2, YANG Yi2   

  1. 1. School of Mapping and Geographical Science, Liaoning Technical University, Fuxin 123000, China;
    2. Chinese Academy of Surveying and Mapping, Beijing 100830, China
  • Received:2015-09-14 Revised:2016-07-11 Online:2017-01-20 Published:2017-02-06
  • Supported by:
    The Special Scientific Research Fund of Public Welfare Profession of China (No.201512032);The National Key Research and Development Program of China(No.2016YFC0803101)

摘要: 地理加权回归方法在小样本数据下回归分析精度往往不高。半监督学习是一种利用未标记样本参与训练的机器学习方法,可以有效地提升少量有标记样本的学习性能。基于此本文提出了一种基于半监督学习的地理加权回归方法,其核心思想是利用有标记样本建立回归模型来训练未标记样本,再选择置信度高的结果扩充有标记样本,不断训练,以提高回归性能。本文采用模拟数据和真实数据进行试验,以均方误差提升百分比作为性能评价指标,将SSLGWR与GWR、COREG对比分析。模拟数据试验中,SSLGWR在3种不同配置下性能分别提升了39.66%、11.92%和0.94%。真实数据试验中,SSLGWR在3种不同配置下性能分别提升了8.94%、3.36%和5.87%。SSLGWR结果均显著优于GWR和COGWR。试验证明,半监督学习方法能利用未标记数据提升地理加权回归模型的性能,特别是在有标记样本数量较少时作用显著。

关键词: 地理加权回归, 半监督学习, SSLGWR, 人口分布

Abstract: Geographically weighted regression (GWR) approach will be affected by the quantity of label data. However, it is difficult to get labeled data but easy to get the unlabeled data in applications. Therefore it is indispensable to find an useful way that can use the unlabeled data to improve the regression results. As we know semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training typically a small amount of labeled data with a large amount of unlabeled data. So this article develops a semi-supervised-learning geographically weighted regression (SSLGWR). Firstly it builds the GWR model by labeled data. Then the unlabeled data can be calculated the value by the GWR model and they will be signed as new labeled data. Thirdly, use both labeled data and new labeled data to rebuild the GWR model to improve the model's precision. The experiments use both simulated data and real data to compare GWR COGWR and SSLGWR. Mean square error is closed as the framework to estimate the models. Experiments using simulated data have shown that the proposed model improves the performance by 39.66%, 11.92% and 0.94% relative to 10%,30% and 50% label data. And experiments using real data have shown that the proposed model improves the performance by 8.94%, 3.36% and 5.87%. The results demonstrate that there are substantial benefits of SSLGWR in the improvement of GWR.

Key words: geographically weighted regression, semi-supervised learning, SSLGWR, population distribution

中图分类号: