测绘学报 ›› 2023, Vol. 52 ›› Issue (4): 624-637.doi: 10.11947/j.AGCS.2023.20210659

• 摄影测量学与遥感 • 上一篇    下一篇

注意力引导特征融合与联合学习的遥感影像场景分类

余东行1,2, 徐青2, 赵传3, 郭海涛2, 卢俊2, 林雨准2, 刘相云2   

  1. 1. 海军研究院, 北京 100070;
    2. 信息工程大学地理空间信息学院, 河南 郑州 450001;
    3. 火箭军指挥学院, 湖北 武汉 430012
  • 收稿日期:2021-12-06 修回日期:2022-11-07 发布日期:2023-05-05
  • 通讯作者: 郭海涛 E-mail:ghtgjp2002@163.com
  • 作者简介:余东行(1993-),男,博士,研究方向为遥感图像解译。E-mail:dong_hang@aliyun.com
  • 基金资助:
    国家自然科学基金(42001338)

Attention-guided feature fusion and joint learning for remote sensing image scene classification

YU Donghang1,2, XU Qing2, ZHAO Chuan3, GUO Haitao2, LU Jun2, LIN Yuzhun2, LIU Xiangyun2   

  1. 1. Naval Research Institute, Beijing 100070, China;
    2. Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China;
    3. Rocket Force Command College, Wuhan 430012, China
  • Received:2021-12-06 Revised:2022-11-07 Published:2023-05-05
  • Supported by:
    The National Natural Science Foundation of China (No. 42001338)

摘要: 为充分利用遥感影像的多尺度特征,解决遥感影像尺度差异、类间相似和类内差异等现象给高精度场景分类造成的困难,本文提出了一种注意力引导特征融合和联合学习的遥感影像场景分类方法。首先,利用深层卷积神经网络提取影像不同层次的特征图;然后,利用设计的残差注意力机制增强不同层次特征图的语义信息、抑制冗余噪声信息;最后,使用全局均值池化获取不同层次特征图的全局信息以构建特征向量,并将不同层次的特征向量融合,3个不同层次的特征向量及融合后的特征向量分别采用独立的全连接层进行分类。利用联合损失训练网络参数,采取多分类器决策级融合的方式提高预测的稳健性。在UC Merced、AID和NWPU-RESISC45数据集上的试验结果表明,本文方法显著改善了对相似场景及类内差异显著场景的辨识能力,与使用多尺度特征的同类型场景分类方法相比,总体分类精度分别提高0.84%、4.04%和4.43%。

关键词: 遥感影像, 场景分类, 卷积神经网络, 特征融合, 联合学习, 注意力机制

Abstract: Aiming at the difficulties of high-precision remote sensing image classification caused by scale variations, inter-class similarity and intra-class difference, a method with attention-guided feature fusion and joint learning is proposed for remote sensing image scene classification to make full use of multi-scale features extracted from the images. First, the deep convolutional neural network is used to extract three levels of feature maps from the images. Then, the residual attention mechanism is designed to enhance the semantic information and suppress the noise information of the feature maps. Finally, global average pooling is used to obtain the global information of the feature maps and to construct the feature vectors. Then the three levels of feature vectors are fused by connection.The three levels of feature vectors and the fusion result are classified in independent fully connected layers, respectively.During the training process, the joint loss is calculated to optimize the model's parameters. And multi-classifier decision-level fusion is adopted to improve the robustness of prediction. Experimental results on the UC Merced, AID and NWPU-RESISC45 datasets show that the proposed method can significantly improve the discrimination on similar scenes and scenes with intra-class difference. And compared with the similar method using multi-scale features, the overall accuracies are improved by 0.84%, 4.04% and 4.43%, respectively.

Key words: remote sensing image, scene classification, convolutional neural network, feature fusion, joint learning, attention mechanism

中图分类号: