Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (6): 1195-1211.doi: 10.11947/j.AGCS.2024.20230415

• Smart Surveying and Mapping • Previous Articles     Next Articles

High-resolution optical images change detection based on global information enhancement by pyramid semantic token

Daifeng PENG1,2,3,4(), Chenchen ZHAI1, Dingwei ZHOU1, Yongjun ZHANG5, Haiyan GUAN1, Yufu ZANG1   

  1. 1.School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2.Technology Innovation Center for Integrated Applications in Remote Sensing and Navigation, Ministry of Natural Resources, Nanjing 210044, China
    3.Key Laboratory of National Geographic Census and Monitoring, Ministry of Natural Resources, Wuhan 430079, China
    4.Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources, Nanjing 210013, China
    5.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
  • Received:2023-09-28 Published:2024-07-22
  • About author:PENG Daifeng (1988—), male, PhD, associate professor, majors in remote sensing image intelligent interpretation. E-mail: daifeng@nuist.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42371449);Technology Innovation Center for Integrated Applications in Remote Sensing and Navigation, Ministry of Natural Resources(TICIARSN-2023-07);Key Laboratory of National Geographic Census and Monitoring, Ministry of Natural Resources(2023NGCM02);Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources(KLSMNR-G202308)

Abstract:

Due to the influence of complex background and spectral changes, missing detection of small objects and incomplete detection of geometric structures and details easily arise in remote sensing change detection (CD) domain. To address these issues, this paper proposes a pyramid semantic token guided global information enhancement change detection network (PST-GIENet) by combining the advantages of convolutional neural network (CNN) and Transformer network. Firstly, ResNet18 network without max-pooling layer is adopted to generate bi-temporal deep features, which are fused and refined by joint attention mechanism and deep supervision strategy. Secondly, image features are represented as multi-scale semantic token through spatial pyramid pooling, a Transformer encoder-decoder is subsequently employed to model the global context of the fused features. Finally, change map is produced through a layer-wise up-sampling decoder. To verify the effectiveness of the proposed method, extensive experiments and analysis were conducted on three publicly available CD datasets, including LEVIR-CD, CDD, and WHU-CD. The quantitative results showed that PST-GIENet achieved the highest metric scores in all the three datasets, with F1 scores of 91.71%, 96.16%, and 94.08%, respectively. In addition, visual results indicate that PST-GIENet can effectively suppress the interference from complex backgrounds and spectral distortions, which significantly enhances the network's ability to capture edge structures and multi-scale changes of ground objects, achieving the best visual performance.

Key words: high-resolution remote sensing images, change detection, pyramid semantic tokens, global dependency, attention mechanism

CLC Number: