测绘学报 ›› 2024, Vol. 53 ›› Issue (6): 1195-1211.doi: 10.11947/j.AGCS.2024.20230415

• 智能化测绘 • 上一篇    下一篇

基于金字塔语义token全局信息增强的高分光学遥感影像变化检测

彭代锋1,2,3,4(), 翟晨晨1, 周顶蔚1, 张永军5, 管海燕1, 臧玉府1   

  1. 1.南京信息工程大学遥感与测绘工程学院,江苏 南京 210044
    2.自然资源部遥感导航一体化应用工程技术创新中心,江苏 南京 210044
    3.自然资源部地理国情监测重点实验室,湖北 武汉 430079
    4.自然资源部国土卫星遥感应用重点实验室,江苏 南京 210013
    5.武汉大学遥感信息工程学院,湖北 武汉 430079
  • 收稿日期:2023-09-28 发布日期:2024-07-22
  • 作者简介:彭代锋(1988—),男,博士,副教授,研究方向为遥感影像智能解译。 E-mail:daifeng@nuist.edu.cn
  • 基金资助:
    国家自然科学基金(42371449);自然资源部遥感导航一体化应用工程技术创新中心开放基金(TICIARSN-2023-07);自然资源部地理国情监测重点实验室开放基金(2023NGCM02);自然资源部国土卫星遥感应用重点实验室开放基金(KLSMNR-G202308)

High-resolution optical images change detection based on global information enhancement by pyramid semantic token

Daifeng PENG1,2,3,4(), Chenchen ZHAI1, Dingwei ZHOU1, Yongjun ZHANG5, Haiyan GUAN1, Yufu ZANG1   

  1. 1.School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2.Technology Innovation Center for Integrated Applications in Remote Sensing and Navigation, Ministry of Natural Resources, Nanjing 210044, China
    3.Key Laboratory of National Geographic Census and Monitoring, Ministry of Natural Resources, Wuhan 430079, China
    4.Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources, Nanjing 210013, China
    5.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
  • Received:2023-09-28 Published:2024-07-22
  • About author:PENG Daifeng (1988—), male, PhD, associate professor, majors in remote sensing image intelligent interpretation. E-mail: daifeng@nuist.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42371449);Technology Innovation Center for Integrated Applications in Remote Sensing and Navigation, Ministry of Natural Resources(TICIARSN-2023-07);Key Laboratory of National Geographic Census and Monitoring, Ministry of Natural Resources(2023NGCM02);Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources(KLSMNR-G202308)

摘要:

针对复杂背景、光谱变化等因素导致高分辨率遥感影像中细小地物检测缺失,几何结构检测不完整等问题,本文联合卷积网络和Transformer网络优势,提出一种基于金字塔语义token全局信息增强的变化检测网络(PST-GIENet)。首先,利用无最大池化层的ResNet18网络提取多时相影像深度特征以构建融合特征,并采用联合注意力机制和深监督策略提高融合特征表达能力;然后,通过空间金字塔池化将影像特征表示为多尺度语义token,进而利用Transformer编码器和解码器对融合特征空间进行全局上下文建模;最后,通过逐层上采样解码器生成最终变化图。为验证本文方法有效性,采用LEVIR-CD、CDD和WHU-CD3个公开变化检测数据集进行对比试验与分析,定量结果表明PST-GIENet在3个数据集中均取得最优精度指标,其F1值分别达到91.71%、96.16%和94.08%。目视结果表明PST-GIENet可有效抑制复杂背景、光谱变化等因素干扰,显著增强网络对地物边缘结构和多尺度变化的捕捉能力,取得最佳目视效果。

关键词: 高分辨率遥感影像, 变化检测, 金字塔语义token, 全局依赖性, 注意力机制

Abstract:

Due to the influence of complex background and spectral changes, missing detection of small objects and incomplete detection of geometric structures and details easily arise in remote sensing change detection (CD) domain. To address these issues, this paper proposes a pyramid semantic token guided global information enhancement change detection network (PST-GIENet) by combining the advantages of convolutional neural network (CNN) and Transformer network. Firstly, ResNet18 network without max-pooling layer is adopted to generate bi-temporal deep features, which are fused and refined by joint attention mechanism and deep supervision strategy. Secondly, image features are represented as multi-scale semantic token through spatial pyramid pooling, a Transformer encoder-decoder is subsequently employed to model the global context of the fused features. Finally, change map is produced through a layer-wise up-sampling decoder. To verify the effectiveness of the proposed method, extensive experiments and analysis were conducted on three publicly available CD datasets, including LEVIR-CD, CDD, and WHU-CD. The quantitative results showed that PST-GIENet achieved the highest metric scores in all the three datasets, with F1 scores of 91.71%, 96.16%, and 94.08%, respectively. In addition, visual results indicate that PST-GIENet can effectively suppress the interference from complex backgrounds and spectral distortions, which significantly enhances the network's ability to capture edge structures and multi-scale changes of ground objects, achieving the best visual performance.

Key words: high-resolution remote sensing images, change detection, pyramid semantic tokens, global dependency, attention mechanism

中图分类号: