测绘学报 ›› 2024, Vol. 53 ›› Issue (6): 1057-1076.doi: 10.11947/j.AGCS.2024.20230259

• 智能化测绘 • 上一篇    下一篇

图像语义信息在视觉SLAM中的应用研究进展

郭迟1,2,3(), 刘阳1, 罗亚荣2, 刘经南2, 张全2   

  1. 1.武汉大学湖北珞珈实验室,湖北 武汉 430079
    2.武汉大学卫星导航定位技术研究中心,湖北 武汉 430079
    3.武汉大学人工智能研究院,湖北 武汉 430079
  • 收稿日期:2023-09-08 发布日期:2024-07-22
  • 作者简介:郭迟(1983—)男,博士,教授,主要从事北斗技术应用、无人系统智能导航及位置服务理论方法的研究。 E-mail:guochi@whu.edu.cn
  • 基金资助:
    国家重点研发计划(2022YFB3903801);湖北省重大科技专项(2022AAA009);珞珈实验室开放基金;中国博士后科学基金(2023TQ0248)

Research progress in the application of image semantic information in visual SLAM

Chi GUO1,2,3(), Yang LIU1, Yarong LUO2, Jingnan LIU2, Quan ZHANG2   

  1. 1.Hubei Luojia Laboratory, Wuhan University, Wuhan 430079, China
    2.Research Center of GNSS, Wuhan University, Wuhan 430079, China
    3.Artificial Intelligence Institute, Wuhan University, Wuhan 430079, China
  • Received:2023-09-08 Published:2024-07-22
  • About author:GUO Chi (1983—), male, PhD, professor, majors in the application of BeiDou technology, intelligent navigation of unmanned systems, and theoretical methods of location services. E-mail: guochi@whu.edu.cn
  • Supported by:
    The National Key Research and Development Program of China(2022YFB3903801);The Major Science and Technology Project of Hubei Province(2022AAA009);The Open Fund of Hubei Luojia Laboratory;The China Postdoctoral Science Foundation under Grant Number(2023TQ0248)

摘要:

视觉同步定位与建图(visual simultaneous localization and mapping,VSLAM)技术以相机为主要传感器采集图像数据,基于多视几何、状态估计等算法原理获取载体的位置和姿态,同时构建一张用于导航定位的地图。视觉SLAM是自动驾驶、AR(augmented reality)、VR(virtual reality)、MR(mix reality)、智能机器人、无人机飞控中的关键技术。近年来,随着各个产业对智能导航定位的需求日渐增多,原本以几何测量为主的视觉SLAM逐渐融入对环境的语义理解。语义信息是指能够被人类直观感受和理解的概念,而图像语义信息是指图像中物体的轮廓、类别、显著性等信息。相比于图像中的几何特征,语义信息更具时空一致性,且更贴近人类感知的结果。将图像语义信息引入视觉SLAM,既能促进系统各个模块的性能,还能够提升视觉SLAM的智能感知能力,形成集几何测量、定位定姿、环境理解等多种功能的视觉语义SLAM。本文根据图像语义信息的应用方式,对视觉语义SLAM经典方案和最新研究进展进行归纳梳理。在此基础上,本文总结了视觉语义SLAM的现存问题与挑战,指出该领域未来的研究方向,以推动其面向智能导航定位进一步发展。

关键词: 视觉SLAM, 视觉语义SLAM, 深度学习, 智能导航定位

Abstract:

Visual simultaneous localization and mapping (VSLAM) technology uses cameras as the primary sensor to capture image data and obtain the position and orientation of the carrier based on algorithms such as multi-view geometry and state estimation, while simultaneously constructing a map for navigation and localization. VSLAM is a key technology in autonomous driving, AR, VR, MR, intelligent robotics, and drone flight control. In recent years, with the increasing demand for intelligent navigation and localization in various industries, VSLAM, which was originally focused on geometric measurements, has gradually integrated a semantic understanding of the environment. Semantic information refers to concepts that can be directly perceived and understood by humans, and semantic information in images refers to information such as object contours, categories, and saliency. Compared to geometric structures and features, image semantic information is more temporally and spatially consistent and provides results that are closer to human perception. Introducing image semantic information into visual SLAM can not only promote the performance of each module of the system, but also enhance the intelligent perception ability of VSLAM, forming a semantic VSLAM that integrates multiple functions such as geometric measurement, localization, and environment understanding. In this article, based on the application of image semantic information, we summarize the classic solutions and the latest research progress in semantic VSLAM. Based on this, we summarize the existing problems and challenges in visual semantic SLAM and propose future research directions in this field to further promote its development towards intelligent navigation and localization.

Key words: visual SLAM, visual semantic SLAM, deep learning, intelligent navigation and localization

中图分类号: