Acta Geodaetica et Cartographica Sinica ›› 2025, Vol. 54 ›› Issue (8): 1489-1500.doi: 10.11947/j.AGCS.2025.20240254

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

Combining projective transform and road segmentation for street view-satellite images cross-view geo-localization

Wenjian GAN(), Yang ZHOU(), Xiaofei HU, Luying ZHAO, Gaoshuang HUANG, Mingbo HOU   

  1. Institute of Surveying and Mapping, Information Engineering University, Zhengzhou 450001, China
  • Received:2024-06-24 Revised:2025-07-04 Published:2025-09-16
  • Contact: Yang ZHOU E-mail:14737117985@163.com;zhouyang3d@163.com
  • About author:GAN Wenjian (2000—), male, postgraduate, majors in image geo-localization. E-mail: 14737117985@163.com

Abstract:

The vast differences between street view and satellite images make it extremely difficult to match them, which brings great challenges to the research and application of cross-view image geo-localization in this study, we propose a cross-view geo-localization framework based on projective transform and road segmentation to address the difficulties caused by the viewpoint differences in cross-view image geo-localization. Firstly, we establish a geometric projection relationship between the street view and satellite images to achieve the viewpoint transformation from the ground images to satellite images, in order to reduce the viewpoint difference between the street view and satellite images. Meanwhile, to better learn the viewpoint invariant features in the images, we are inspired by self-supervised learning and then use a visual foundation model with strong zero-shot generalization capability for road segmentation, and introduce an auxiliary training branch for road prior information during the training process, to improve the performance of the model without changing the model architecture and the model inference speed. After using our method, the average Recall@1 accuracy of the four methods, SAFA, GeoDTR, SAIG, and TransGeo, is improved by 0.55% on the CVACT dataset and 2.84% on the CVWU dataset. The experimental results show that the proposed cross-view geo-localization method, which combines geometric projective transform with self-supervised learning, can be organically combined with other model architectures.

Key words: projective transform, self-supervised learning, road segmentation, auxiliary training branch, cross-view image geo-localization

CLC Number: