测绘学报 ›› 2024, Vol. 53 ›› Issue (3): 512-525.doi: 10.11947/j.AGCS.2024.20220553

• 摄影测量学与遥感 • 上一篇    下一篇

面向多源异质遥感影像地物分类的自监督预训练方法

薛志祥1,2, 余旭初3, 刘景正4, 杨国鹏2, 刘冰1, 余岸竹1, 周嘉男5, 金上鸿5   

  1. 1. 信息工程大学, 河南 郑州 450001;
    2. 北京航空气象研究所, 北京 100085;
    3. 华北水利水电大学, 河南 郑州 450046;
    4. 93110部队, 北京 100843;
    5. 93116部队, 辽宁 沈阳 110000
  • 收稿日期:2022-10-18 修回日期:2024-01-10 发布日期:2024-04-08
  • 通讯作者: 余旭初 E-mail:xuchu_yu@sina.com
  • 作者简介:薛志祥(1992—),男,博士,研究方向为遥感图像智能解译。E-mail:xuegeeker@163.com
  • 基金资助:
    河南省自然科学基金(222300420387)

A self-supervised pre-training scheme for multi-source heterogeneous remote sensing image land cover classification

XUE Zhixiang1,2, YU Xuchu3, LIU Jingzheng4, YANG Guopeng2, LIU Bing1, YU Anzhu1, ZHOU Jianan5, JIN Shanghong5   

  1. 1. Information Engineering University, Zhengzhou 450001, China;
    2. Beijing Aviation Meteorological Institute, Beijing 100085, China;
    3. North China University of Water Resources and Electric Power, Zhengzhou 450046, China;
    4. Troops 93110, Beijing 100843, China;
    5. Troops 93116, Shenyang 110000, China
  • Received:2022-10-18 Revised:2024-01-10 Published:2024-04-08
  • Supported by:
    The Natural Science Foundation of Henan Province (No. 222300420387)

摘要: 近年来,深度学习改变了遥感图像处理的方法。由于标注高质量样本费时费力,标签样本数量不足的现实问题会严重影响深层神经网络模型的性能。为解决这一突出矛盾,本文提出了用于多源异质遥感影像地物分类的自监督预训练和微调分类方案,旨在缓解模型对于标签样本的严重依赖。具体来讲,生成式自监督学习模型由非对称的编码器-解码器结构组成,其中深度编码器从多源遥感数据中学习高阶关键特征,任务特定的解码器用于重建原始遥感影像。为提升特性表示能力,交叉注意力机制模型用于融合异源特征中的信息,进而从多源异质遥感影像中学习更多的互补信息。在微调分类阶段,预训练好的编码器作为无监督特征提取器,基于Transformer结构的轻量级分类器将学习到的特征与光谱信息结合并用于地物分类。这种自监督预训练方案能够从多源异质遥感影像中学习到刻画原始数据的高级关键特征,并且此过程不需要任何人工标注信息,从而缓解了对标签样本的依赖。与现有的分类范式相比,本文提出的自监督预训练和微调方案在多源遥感影像地物分类中能够取得更优的分类结果。

关键词: 遥感, 多源异质数据, 预训练, 自监督学习, 土地覆盖分类

Abstract: Deep learning has revolutionized the remote sensing image processing techniques over the past few years. Nevertheless, it is laborious to annotate high quality samples, thus limiting the performance of deep networks because of insufficient supervision information. To resolve this contradiction, we investigate the self-supervised pre-training and fine-tuning paradigm for multi-source heterogeneous remote sensing image land cover classification, aiming to relieve the urgent need for manually annotated data. Specifically, the proposed generative feature learning model consists of asymmetric encoder-decoder structure, in which the deep encoder extracts high-level key characteristics contained in multi-source data and task-specific lightweight decoders are developed to reconstruct original data. To further improve the feature representation capability, the cross-attention layers are utilized to exchange information contained in heterogeneous characteristics, thus learning more complementary information from multi-source remote sensing data. In fine-tuning stage, the trained encoder is employed as unsupervised feature extractor, and learned features are utilized for land cover classification through the designed lightweight Transformer based classifier. This self-supervised pre-training architecture is capable of learning high-level key features from multi-source heterogenous remote sensing images, and this process does not require any labeled information, thus relieving the urgent need for labeled samples. Compared with existing classification paradigms, the proposed multimodal self-supervised pre-training and fine-tuning scheme achieves superior performance for remote sensing image classification.

Key words: remote sensing, multi-source heterogeneous data, pre-training, self-supervised learning, land cover classification

中图分类号: