Acta Geodaetica et Cartographica Sinica ›› 2026, Vol. 55 ›› Issue (5): 927-940.doi: 10.11947/j.AGCS.2026.20250521

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

Heterogeneous remote sensing change detection based on vision-language collaborative representation for flood disasters

Rui YU1(), Jie LI1,2, Huihui LIU1(), Meiru WU1, Liupeng LIN3, Qiangqiang YUAN1, Li ZHENG1   

  1. 1.School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China
    2.Hubei Luojia Laboratory, Wuhan 430079, China
    3.School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China
  • Received:2025-12-15 Revised:2026-04-21 Online:2026-06-23 Published:2026-06-23
  • Contact: Huihui LIU E-mail:rui.yu@whu.edu.cn;hhliu@sgg.whu.edu.cn
  • About author:YU Rui (2002—), male, postgraduate, majors in remote sensing change detection and deep learning-based multi-task learning. E-mail: rui.yu@whu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42471504; 42301417)

Abstract:

Heterogeneous change detection using optical and SAR imagery is of great significance for disaster emergency response and all-weather monitoring. However, the significant differences in their imaging mechanisms lead to inconsistent feature distributions, which, coupled with the lack of annotated samples and textual descriptions, restrict the detection performance of traditional methods and existing deep learning models. To this end, this paper proposes a multi-dimensional change enhancement CLIP change detection network (MCE-CLIP), aiming to tackle the challenges of heterogeneous image change detection in flood disaster scenarios. The network constructs a cross-modal semantic guidance mechanism based on “SAR image transfer-text generation”, effectively narrowing the semantic gap between heterogeneous images. Meanwhile, a pseudo-siamese visual feature extraction branch and a multi-dimensional change feature enhancement module (MCFEM) are designed. By embedding modality adapters, the domain distribution discrepancy of remote sensing images is reduced. Furthermore, the MCFEM is constructed by integrating temporal cross-attention, multi-granularity differencing, and hybrid similarity projection, achieving the efficient integration of spatiotemporal contextual information. Experimental results on two typical heterogeneous datasets demonstrate that MCE-CLIP outperforms existing mainstream heterogeneous change detection methods in core evaluation metrics such as F1 score and intersection over union.

Key words: change detection, heterogeneous remote sensing imagery, vision-language model, multi-modal fusion, SAR

CLC Number: