测绘学报 ›› 2021, Vol. 50 ›› Issue (11): 1478-1486.doi: 10.11947/j.AGCS.2021.20210308

• 智能驾驶环境感知 • 上一篇    下一篇

语素关联约束的动态环境视觉定位优化

邵晓航, 吴杭彬, 刘春, 陈晨, 蔡天池, 程帆瑾   

  1. 同济大学测绘与地理信息学院, 上海 200092
  • 收稿日期:2021-05-31 修回日期:2021-08-30 发布日期:2021-12-07
  • 通讯作者: 刘春 E-mail:liuchun@tongji.edu.cn
  • 作者简介:邵晓航(1992—),男,博士生,研究方向为视觉机器人定位与环境认知。
  • 基金资助:
    “十三五”重点研发计划(2018YFB1305003);国家自然科学基金(41771481);中央高校基本科研学科交叉重点项目(22120190195)

Visual odometry optimizing bounded with semantic elements association in dynamic scenes

SHAO Xiaohang, WU Hangbin, LIU Chun, CHEN Chen, CAI Tianchi, CHENG Fanjin   

  1. College of surveying and Geo-informatics, Tongji university, Shanghai 200092, China
  • Received:2021-05-31 Revised:2021-08-30 Published:2021-12-07
  • Supported by:
    The National Science and Technology Major Program (No. 2018YFB1305003);The National Natural Science Foundation of China (No. 41771481);The Fundamental Research of Interdisciplinary Program for Central Universities (No. 22120190195)

摘要: 在自动驾驶场景中,视觉相机能够实现低成本的定位与环境感知,但是场景中的动态目标会影响视觉定位的轨迹。对此,本文提出了语素关联约束的动态环境视觉定位优化方法。首先,利用目标检测和语义分割提取环境中的语义实体;然后,通过语素关联模型识别出动态语素;最后,建立动态语素的特征掩膜,用于特征匹配过程中的动态目标特征点过滤,从而提高视觉定位效果。本文基于视觉机器人平台在校园道路开展了试验,发现了动态目标通过关键点影响视觉定位结果的规律——在转弯时或者目标在视野中横向移动时影响较大。试验结果表明,本文方法的动态语义要素识别的平均精度F1约为87%,并且在语素关联优化前后,局部区域最大轨迹距离差为2.463 m,与真值对比RMSE降低了38%。

关键词: 视觉定位, 同时定位与地图构建, 动态场景, 语义关联, 语义分割

Abstract: Cameras can help to make low-cost positioning and environment perceiving in autopilot but dynamic objects give negative effects to visual odometry. This paper gives a model to optimize it based on semantic elements association. It uses such techniques as objects detection and semantic segmentation to identify semantic objects and distinguish dynamic semantic elements(DSE) from static ones. Then it filters bad keypoints by a dynamic feature mask in visual positioning. In practice, this proposed method detects DSE even when there are objects with duality of moving and static. In an experiment on campus roads, negative influences were found especially when a robot turning around or a moving object crossing its camera view. It showed that average accuracy in detecting DSE was 87% and the largest difference between trajectories before and after semantic association optimizing in sub-sequences was 2.463 m. Compared to ground truth, the RMSE of proposed method dropped by 38%.

Key words: visual odometry, SLAM, dynamic scenes, semantic association, semantic segmentation

中图分类号: