测绘学报 ›› 2024, Vol. 53 ›› Issue (2): 353-366.doi: 10.11947/j.AGCS.2024.20220692

• 摄影测量学与遥感 • 上一篇    下一篇

一种耦合DeepLab与Transformer的农作物种植类型遥感精细分类方法

林云浩1,2,3, 王艳军1,2,3, 李少春1,2,3, 蔡恒藩1,2,3   

  1. 1. 湖南科技大学测绘遥感信息工程湖南省重点实验室, 湖南 湘潭 411201;
    2. 湖南科技大学地理空间信息技术国家地方联合工程实验室, 湖南 湘潭 411201;
    3. 湖南科技大学地球科学与空间信息工程学院, 湖南 湘潭 411201
  • 收稿日期:2022-12-19 修回日期:2023-09-05 发布日期:2024-03-08
  • 通讯作者: 王艳军 E-mail:wongyanjun@163.com
  • 作者简介:林云浩(1999-),男,硕士生,研究方向为多源遥感数据处理。E-mail:1056267419@qq.com
  • 基金资助:
    国家自然科学基金(41971423;31972951)

A coupled DeepLab and Transformer approach for fine classification of crop cultivation types in remote sensing

LIN Yunhao1,2,3, WANG Yanjun1,2,3, LI Shaochun1,2,3, CAI Hengfan1,2,3   

  1. 1. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan 411201, China;
    2. National-local Joint Engineering Laboratory of Geo-spatial Information Technology, Hunan University of Science and Technology, Xiangtan 411201, China;
    3. School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
  • Received:2022-12-19 Revised:2023-09-05 Published:2024-03-08
  • Supported by:
    The National Natural Science Foundation of China (Nos. 41971423; 31972951)

摘要: 如何精细遥感监测复杂的不同类型农田作物种植情况,是智慧农业农村领域实现农耕面积调查与农作物估产的关键。目前的高分辨率影像的作物种植像素级语义分割中,深度卷积神经网络难以兼顾空间多尺度全局特征和局部细节特征,从而导致各类农田地块之间边界轮廓模糊和同类农田区域内部完整性不高等问题。针对这些不足,本文提出了一种耦合DeepLabv3+和Transformer编码器的双分支并行特征融合网络FDTNet,以实现农作物种植类型的精细遥感监测。首先,在FDTNet中并行嵌入DeepLabv3+和Transformer分别捕获农田影像的局部特征和全局特征;其次,应用耦合注意力融合模块CAFM有效融合两者的特征;然后,在解码器阶段应用卷积注意力模块CBAM增强卷积层有效特征的权重;最后,采用渐进式多层特征融合策略将编码器和解码器中的有效特征全面融合并输出特征图,以实现晚稻、中稻、藕田、菜地和大棚的高精度分类识别。为了验证FDTNet网络模型在高分辨率作物分类应用的有效性,本文选择不同高分辨率的Yuhu数据集和Zhejiang数据集验证,mIoU分别达到74.7%和81.4%。相比于已有的UNet、DeepLabv3、DeepLabv3+、ResT和Res-Swin等深度学习方法,FDTNet的mIoU可分别高2.2%和3.6%。结果表明,FDTNet在纹理单一、大样本量,以及纹理多样、小样本量的两类农田场景中同时表现出优于对比方法的性能,具有较全面的多类别农作物有效特征提取能力。

关键词: 高分辨率遥感影像, 农作物种植类型, 语义分割, 特征融合, 深度学习

Abstract: How to accurately monitor the planting of different types of complex farmland crops by remote sensing is the key to the realization of agricultural area survey and crop yield estimation in the area of smart rural agriculture. In the current pixel level semantic segmentation of crop planting in high-resolution images, the deep convolution neural network is difficult to take into account the spatial multi-scale global features and local details, which leads to problems such as blurring boundary contours between various farmland plots and low internal integrity of the same farmland area. In view of these shortcomings, this paper designs and proposes a dual branch parallel feature fusion network (FDTNet) that couples DeepLabv3+and Transformer encoders to achieve fine remote sensing monitoring of crop planting. Firstly, DeepLabv3+and Transformer are embedded in FDTNet in parallel to capture the local and global features of farmland image respectively. Secondly, the coupled attention fusion module (CAFM) is used to effectively fuse the characteristics of the two features. Then, in the decoder stage, the convolutional block attention module (CBAM) is applied to enhance the weight of the effective features of the convolutional layer. Finally, the progressive multi-level feature fusion strategy is adopted to fully fuse the effective features in the encoder and deco-der, and output the feature map to achieve high-precision classification and recognition of late rice, middle rice, lotus root field, vegetable field and greenhouse. In order to verify the effectiveness of FDTNet network model in high-resolution crop classification application, this paper selects different high-resolution Yuhu dataset and Zhejiang dataset and experimental results of mIoU reach 74.7% and 81.4%, respectively. The mIoU of FDTNet can be 2.2% and 3.6% respectively higher than the existing deep learning methods, such as UNet, DeepLabv3, DeepLabv3+, ResT and Res-Swin. The results show that FDTNet has better classification performance than the compared methods in two types of farmland scenes, which have single texture and large sample size, or multiple texture and small sample size. The proposed FDTNet has a comprehensive ability to extract effective features of multiple category crops.

Key words: high-resolution remote sensing image, crop planting type, semantic segmentation, feature fusion, deep learning

中图分类号: