Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (2): 353-366.doi: 10.11947/j.AGCS.2024.20220692

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

A coupled DeepLab and Transformer approach for fine classification of crop cultivation types in remote sensing

LIN Yunhao1,2,3, WANG Yanjun1,2,3, LI Shaochun1,2,3, CAI Hengfan1,2,3   

  1. 1. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan 411201, China;
    2. National-local Joint Engineering Laboratory of Geo-spatial Information Technology, Hunan University of Science and Technology, Xiangtan 411201, China;
    3. School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
  • Received:2022-12-19 Revised:2023-09-05 Published:2024-03-08
  • Supported by:
    The National Natural Science Foundation of China (Nos. 41971423; 31972951)

Abstract: How to accurately monitor the planting of different types of complex farmland crops by remote sensing is the key to the realization of agricultural area survey and crop yield estimation in the area of smart rural agriculture. In the current pixel level semantic segmentation of crop planting in high-resolution images, the deep convolution neural network is difficult to take into account the spatial multi-scale global features and local details, which leads to problems such as blurring boundary contours between various farmland plots and low internal integrity of the same farmland area. In view of these shortcomings, this paper designs and proposes a dual branch parallel feature fusion network (FDTNet) that couples DeepLabv3+and Transformer encoders to achieve fine remote sensing monitoring of crop planting. Firstly, DeepLabv3+and Transformer are embedded in FDTNet in parallel to capture the local and global features of farmland image respectively. Secondly, the coupled attention fusion module (CAFM) is used to effectively fuse the characteristics of the two features. Then, in the decoder stage, the convolutional block attention module (CBAM) is applied to enhance the weight of the effective features of the convolutional layer. Finally, the progressive multi-level feature fusion strategy is adopted to fully fuse the effective features in the encoder and deco-der, and output the feature map to achieve high-precision classification and recognition of late rice, middle rice, lotus root field, vegetable field and greenhouse. In order to verify the effectiveness of FDTNet network model in high-resolution crop classification application, this paper selects different high-resolution Yuhu dataset and Zhejiang dataset and experimental results of mIoU reach 74.7% and 81.4%, respectively. The mIoU of FDTNet can be 2.2% and 3.6% respectively higher than the existing deep learning methods, such as UNet, DeepLabv3, DeepLabv3+, ResT and Res-Swin. The results show that FDTNet has better classification performance than the compared methods in two types of farmland scenes, which have single texture and large sample size, or multiple texture and small sample size. The proposed FDTNet has a comprehensive ability to extract effective features of multiple category crops.

Key words: high-resolution remote sensing image, crop planting type, semantic segmentation, feature fusion, deep learning

CLC Number: