Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (3): 512-525.doi: 10.11947/j.AGCS.2024.20220553

• hotogrammetry and Remote Sensing • Previous Articles     Next Articles

A self-supervised pre-training scheme for multi-source heterogeneous remote sensing image land cover classification

XUE Zhixiang1,2, YU Xuchu3, LIU Jingzheng4, YANG Guopeng2, LIU Bing1, YU Anzhu1, ZHOU Jianan5, JIN Shanghong5   

  1. 1. Information Engineering University, Zhengzhou 450001, China;
    2. Beijing Aviation Meteorological Institute, Beijing 100085, China;
    3. North China University of Water Resources and Electric Power, Zhengzhou 450046, China;
    4. Troops 93110, Beijing 100843, China;
    5. Troops 93116, Shenyang 110000, China
  • Received:2022-10-18 Revised:2024-01-10 Published:2024-04-08
  • Supported by:
    The Natural Science Foundation of Henan Province (No. 222300420387)

Abstract: Deep learning has revolutionized the remote sensing image processing techniques over the past few years. Nevertheless, it is laborious to annotate high quality samples, thus limiting the performance of deep networks because of insufficient supervision information. To resolve this contradiction, we investigate the self-supervised pre-training and fine-tuning paradigm for multi-source heterogeneous remote sensing image land cover classification, aiming to relieve the urgent need for manually annotated data. Specifically, the proposed generative feature learning model consists of asymmetric encoder-decoder structure, in which the deep encoder extracts high-level key characteristics contained in multi-source data and task-specific lightweight decoders are developed to reconstruct original data. To further improve the feature representation capability, the cross-attention layers are utilized to exchange information contained in heterogeneous characteristics, thus learning more complementary information from multi-source remote sensing data. In fine-tuning stage, the trained encoder is employed as unsupervised feature extractor, and learned features are utilized for land cover classification through the designed lightweight Transformer based classifier. This self-supervised pre-training architecture is capable of learning high-level key features from multi-source heterogenous remote sensing images, and this process does not require any labeled information, thus relieving the urgent need for labeled samples. Compared with existing classification paradigms, the proposed multimodal self-supervised pre-training and fine-tuning scheme achieves superior performance for remote sensing image classification.

Key words: remote sensing, multi-source heterogeneous data, pre-training, self-supervised learning, land cover classification

CLC Number: