测绘学报 ›› 2025, Vol. 54 ›› Issue (8): 1489-1500.doi: 10.11947/j.AGCS.2025.20240254

• 摄影测量学与遥感 • 上一篇    下一篇

基于投影转换与道路分割的街景-卫星影像跨视角定位方法

甘文建(), 周杨(), 胡校飞, 赵璐颖, 黄高爽, 侯铭波   

  1. 信息工程大学地理空间信息学院,河南 郑州 450001
  • 收稿日期:2024-06-24 修回日期:2025-07-04 发布日期:2025-09-16
  • 通讯作者: 周杨 E-mail:14737117985@163.com;zhouyang3d@163.com
  • 作者简介:甘文建(2000—),男,硕士生,研究方向为图像地理定位技术。E-mail:14737117985@163.com

Combining projective transform and road segmentation for street view-satellite images cross-view geo-localization

Wenjian GAN(), Yang ZHOU(), Xiaofei HU, Luying ZHAO, Gaoshuang HUANG, Mingbo HOU   

  1. Institute of Surveying and Mapping, Information Engineering University, Zhengzhou 450001, China
  • Received:2024-06-24 Revised:2025-07-04 Published:2025-09-16
  • Contact: Yang ZHOU E-mail:14737117985@163.com;zhouyang3d@163.com
  • About author:GAN Wenjian (2000—), male, postgraduate, majors in image geo-localization. E-mail: 14737117985@163.com

摘要:

地面街景影像与卫星影像之间巨大的视角差异,使得两者之间的匹配极为困难,这给跨视角图像定位的研究与应用带来巨大挑战。针对街景-卫星影像跨视角定位中视角差异造成的定位困难问题,本文提出了一种基于投影转换与道路分割的跨视角定位方法。首先,通过在街景影像与卫星影像之间建立投影关系,实现地面图像视角向卫星图像视角之间的转换,以减小地面图像与卫星图像之间的视角差异。同时,为了更好地学习图像中的视点不变性特征,本文从自监督学习范式出发,利用具有强大零样本泛化能力的视觉基础模型进行道路分割,在训练过程中引入道路先验信息的辅助训练分支,从而在不改变模型架构和模型推理速度的情况下提高模型的性能。结合本文方法后,SAFA、GeoDTR、SAIG、TransGeo 4种跨视角定位方法在CVACT数据集上的平均Recall@1精度提升了0.55个百分点,在CVWU数据集上的平均Recall@1提升了2.84个百分点。试验结果表明本文方法可以与其他模型架构进行有机结合,具有很好的适用性。

关键词: 投影转换, 自监督学习, 道路分割, 辅助训练分支, 跨视角图像定位

Abstract:

The vast differences between street view and satellite images make it extremely difficult to match them, which brings great challenges to the research and application of cross-view image geo-localization in this study, we propose a cross-view geo-localization framework based on projective transform and road segmentation to address the difficulties caused by the viewpoint differences in cross-view image geo-localization. Firstly, we establish a geometric projection relationship between the street view and satellite images to achieve the viewpoint transformation from the ground images to satellite images, in order to reduce the viewpoint difference between the street view and satellite images. Meanwhile, to better learn the viewpoint invariant features in the images, we are inspired by self-supervised learning and then use a visual foundation model with strong zero-shot generalization capability for road segmentation, and introduce an auxiliary training branch for road prior information during the training process, to improve the performance of the model without changing the model architecture and the model inference speed. After using our method, the average Recall@1 accuracy of the four methods, SAFA, GeoDTR, SAIG, and TransGeo, is improved by 0.55% on the CVACT dataset and 2.84% on the CVWU dataset. The experimental results show that the proposed cross-view geo-localization method, which combines geometric projective transform with self-supervised learning, can be organically combined with other model architectures.

Key words: projective transform, self-supervised learning, road segmentation, auxiliary training branch, cross-view image geo-localization

中图分类号: