测绘学报 ›› 2026, Vol. 55 ›› Issue (2): 328-343.doi: 10.11947/j.AGCS.2026.20250331

• 摄影测量学与遥感 • 上一篇    

基于多尺度跨模态特征融合的异源遥感影像洪水变化检测

彭代锋1,2(), 刘雪莲1, 鲁梦飞1, 管海燕1   

  1. 1.南京信息工程大学遥感与测绘工程学院,江苏 南京 210044
    2.自然资源部遥感导航一体化应用工程技术创新中心,江苏 南京 210044
  • 收稿日期:2025-09-04 修回日期:2026-01-16 发布日期:2026-03-13
  • 作者简介:彭代锋(1988—),男,博士,副教授,研究方向为遥感影像智能解译。 E-mail:daifeng@nuist.edu.cn
  • 基金资助:
    国家自然科学基金(42371449; 41801386)

Heterogeneous remote sensing image flood change detection based on multi-scale cross-modal feature fusion

Daifeng PENG1,2(), Xuelian LIU1, Mengfei LU1, Haiyan GUAN1   

  1. 1.School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2.Technology Innovation Center for Integrated Applications in Remote Sensing and Navigation, Ministry of Natural Resources, Nanjing 210044, China
  • Received:2025-09-04 Revised:2026-01-16 Published:2026-03-13
  • About author:PENG Daifeng (1988—), male, PhD, associate professor, majors in remote sensing image intelligent interpretation. E-mail: daifeng@nuist.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42371449; 41801386)

摘要:

针对现有端到端异源变化检测方法未能有效顾及不同模态特征差异,且难以兼顾局部细节与全局语义信息等问题,本文提出一种基于多尺度跨模态特征融合的异源遥感影像变化检测网络。首先,基于编码器-解码器架构,在编码部分利用遥感基础模型构建多模态影像多尺度特征表达,为增强多尺度特征纹理结构信息,引入特征增强模块,通过瓶颈结构多尺度卷积设计有效增强不同模态特征细节信息,同时抑制噪声干扰。其次,为有效顾及不同模态特征差异、实现浅层异构特征高效融合,引入选择性跨模态融合模块,通过学习动态权重实现多模态特征自适应融合,有效捕捉模态间互补信息,提升融合特征稳健性与表达能力。然后,为有效建模深层异构特征时空上下文信息,引入跨模态交叉注意力融合模块,通过空间注意力机制和通道注意力机制有效捕捉不同模态特征时空关联,显著增强融合特征稳健性与可靠性。最后,提出自适应上采样模块实现编码器-解码器特征的对齐与融合,有效弥补解码过程中细节信息缺失、积聚变化信息,并通过三层卷积和上采样模块组成的变化头生成变化图。为验证本文方法的有效性,使用CAU-Flood和Ombria两个大规模洪水变化检测数据集进行试验。结果表明,与传统方法相比,本文方法在两个数据集上均取得最优精度指标,同时变化检测漏检率和虚检率显著降低,取得了最优目视效果。消融试验进一步验证了MHCDNet各模块的有效性,模型复杂度分析表明,MHCDNet计算复杂度较低,取得了精度与效率的最佳平衡。

关键词: 异源变化检测, 特征增强, 选择性跨模态融合, 跨模态交叉注意力融合, 自适应上采样

Abstract:

To address the limitations in existing end-to-end heterogeneous change detection methods, which often neglect modality-specific feature differences and struggle to balance local details with global semantics, this paper introduces a multi-scale heterogeneous change detection network (MHCDNet) featuring cross-modal fusion for heterogeneous remote sensing imagery, which is built upon an encoder-decoder architecture. In the encoding part, a remote sensing foundation model is utilized to construct multi-scale feature representations for multi-modal images. To enhance the textural and structural information, a feature enhancement module (FEM) is introduced, which employs a bottleneck structure with multi-scale convolution design to effectively enhance detail information in different modal features while suppressing noise interference. Furthermore, to effectively account for the differences in multimodal features and achieve efficient fusion of shallow heterogeneous features, a selective cross-modal fusion module (SCFM) is introduced, which learns dynamic weights to enable adaptive fusion of multi-modal features, effectively capturing complementary information between modalities, thereby enhancing the robustness and representational capacity of fused features. Additionally, to effectively model the spatiotemporal context of deep heterogeneous features, a cross-modal cross-attention fusion module (CCFM) is introduced, which leverages both spatial and channel attention mechanisms to capture inter-modal spatiotemporal correlations, significantly enhancing the robustness and reliability of fused features. Finally, an adaptive up-sampling module (AUM) is proposed to achieve alignment and fusion of encoder-decoder features, effectively compensating for the loss of detail information during the decoding process, accumulating the change information, and generating change maps through a change head composed of three convolutional layers and up-sampling modules. To verify the effectiveness of the proposed method, experiments are conducted on two large-scale flood change detection datasets, CAU-Flood and Ombria. The results demonstrate that compared with other methods, MHCDNet achieves the best accuracy metrics on both datasets, while significantly reducing the false alarms and missed detections in change detection, yielding optimal visual results. Furthermore, ablation studies further verify the effectiveness of each module in MHCDNet. Model complexity analysis demonstrates that MHCDNet possesses low computational complexity, achieving the best balance between accuracy and efficiency.

Key words: heterogeneous change detection, feature enhancement, selective cross-modal fusion, cross-modal cross-attention fusion, adaptive up-sampling

中图分类号: