测绘学报 ›› 2022, Vol. 51 ›› Issue (11): 2355-2364.doi: 10.11947/j.AGCS.2022.20200522

• 摄影测量学与遥感 • 上一篇    下一篇

结合多尺度共享编码的半监督网络航空影像语义分割

李佳田1, 杨汝春1, 姚彦吉1, 贺日兴2,3, 阿晓荟1, 吕少云1   

  1. 1. 昆明理工大学国土资源工程学院, 云南 昆明 650093;
    2. 首都师范大学资源环境与旅游学院, 北京 100048;
    3. 首都师范大学三维数据获取与应用教育部重点实验室, 北京 100048
  • 收稿日期:2020-10-23 修回日期:2022-05-05 发布日期:2022-11-30
  • 通讯作者: 杨汝春 E-mail:1572112413@qq.com
  • 作者简介:李佳田(1975—),男,教授,博士生导师,研究方向为数值最优化方法与机器场景理解。 E-mail: ljtwcx@163.com
  • 基金资助:
    国家自然科学基金(41561082)

Semantic segmentation of aerial image based on semi-supervised network with multi-scale shared coding

LI Jiatian1, YANG Ruchun1, YAO Yanji1, HE Rixing2,3, A Xiaohui1, LÜ Shaoyun1   

  1. 1. Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China;
    2. College of Resources Environment and Tourism, Capital Normal University, Beijing 100048, China;
    3. Key Laboratory of 3D Information Acquisition and Application, MOE, Capital Normal University, Beijing 100048, China
  • Received:2020-10-23 Revised:2022-05-05 Published:2022-11-30
  • Supported by:
    The National Natural Science Foundation of China (No. 41561082)

摘要: 在半监督语义分割中,主要采用编码-主从解码器结构使无标签样本参与计算以提高分割精度,但编码器的连续下采样操作易丢失浅层细节特征,从而导致地物边界分割不完整。为此,本文提出结合多尺度共享编码的半监督网络架构对航空影像进行语义分割,该网络的编码器采用ResNet-50获取影像浅层特征,并通过在ResNet-50末端嵌入多尺度共享编码模块来链接浅层特征,以构建密集特征金字塔和扩大感受野,从而获取目标地物多尺度细节信息。将本文网络与UNet、DeepLabv3+、FCN监督网络和CCT、XModalNet、VLCNet半监督网络在LandCover.ai和DroneDeploy数据集上分别进行了对比试验和精度评估。结果表明:本文网络在标签数量与精度方面均具有明显优势,对于LandCover.ai数据集,在6000张标签样本和6500张无标签样本的前提下,整体mIoU提升1.15%,对于DroneDeploy数据集,在30张标签样本和5张无标签样本的前提下,整体mIoU提升0.94%,同时显著提升影像地物的分割精度,得到更清晰、完整的地物边界。

关键词: 半监督, 语义分割, 多尺度共享编码器, 主-从解码器, 航空影像

Abstract: In semi-supervised semantic segmentation, the segmentation accuracy of aerial images is mainly improved by using the structure of encoder—master-auxiliary decoder which applies the unlabeled samples to the calculation. However, the loss of shallow detail features which is caused by continuous downsampling in the process of encoding makes the boundary of ground objects incomplete. Therefore, a semi-supervised network combining multi-scale shared encoding is proposed for semantic segmentation of aerial images. The encoder uses ResNet-50 to obtain the shallow features of the image, and links the shallow features by embedding a multi-scale shared coding module at the end of ResNet-50 to build a dense feature pyramid and expand the receptive field, thereby obtaining multi-scale detailed information of the target feature. The effectiveness of the proposed method is verified by compared with UNet, DeepLabv3+, FCN and CCT, XModalNet, VLCNet on the two datasets of LandCover.ai and DroneDeploy, and the result shows that our network has obvious advantages in terms of label number and accuracy. For the LandCover.ai dataset, under the premise of 6000 labeled samples and 6500 unlabeled samples, the overall mIoU increased by 1.15%. For the DroneDeploy dataset, under the premise of 30 labeled samples and 5 unlabeled samples, the overall mIoU increased by 0.94%, while significantly improving the segmentation accuracy of ground objects to obtain a clear and complete ground boundary.

Key words: semi-supervised, semantic segmentation, multiscale shared encoder, master-auxiliary decoder, aerial images

中图分类号: