测绘学报 ›› 2016, Vol. 45 ›› Issue (11): 1328-1334.doi: 10.11947/j.AGCS.2016.20160046

• 地图学与地理信息 • 上一篇    下一篇

历史数据和强化学习相结合的低频轨迹数据匹配算法

孙文彬, 熊婷   

  1. 中国矿业大学(北京)地球科学与测绘工程学院, 北京 100083
  • 收稿日期:2016-02-01 修回日期:2016-10-01 出版日期:2016-11-20 发布日期:2016-12-03
  • 通讯作者: 熊婷 E-mail:391074727@qq.com
  • 作者简介:孙文彬(1977-),男,博士,副教授,研究方向为全球离散格网理论、智能计算、并行计算。E-mail:swb1996@126.com
  • 基金资助:
    国家自然科学基金(41671383)

A Low-Sampling-Rate Trajectory Matching Algorithm in Combination of History Trajectory and Reinforcement Learning

SUN Wenbin, XIONG Ting   

  1. College of Geosciences and Surveying Engineering, China University of Mining and Technology(Beijing), Beijing 100083, China
  • Received:2016-02-01 Revised:2016-10-01 Online:2016-11-20 Published:2016-12-03
  • Supported by:
    The National Natural Science Foundation of China (No.41671383)

摘要: 针对低频(采样间隔大于1 min)轨迹数据匹配算法精度不高的问题,提出了一种基于强化学习和历史轨迹的匹配算法HMDP-Q,首先通过增量匹配算法提取历史路径作为历史参考经验库;根据历史参考经验库、最短路径和可达性筛选候选路径集;再将地图匹配过程建模成马尔科夫决策过程,利用轨迹点偏离道路距离和历史轨迹构建回报函数;然后借助强化学习算法求解马尔科夫决策过程的最大回报值,即轨迹与道路的最优匹配结果;最后应用某市浮动车轨迹数据进行试验。结果表明:本文算法能有效提高轨迹数据与道路匹配精度;本算法在1 min低频采样间隔下轨迹匹配准确率达到了89.2%;采样频率为16 min时,该算法匹配精度也能达到61.4%;与IVVM算法相比,HMDP-Q算法匹配精度和求解效率均优于IVVM算法,16 min采样频率时本文算法轨迹匹配精度提高了26%。

关键词: 低频浮动车数据, 轨迹匹配, 马尔科夫决策过程, 强化学习

Abstract: In order to improve the accuracy of low frequency (sampling interval greater than 1 minute) trajectory data matching algorithm, this paper proposed a novel matching algorithm termed HMDP-Q (History Markov Decision Processes Q-learning). The new algorithm is based on reinforced learning on historic trajectory. First, we extract historic trajectory data according to incremental matching algorithm as historical reference, and filter the trajectory dataset through the historic reference, the shortest trajectory and the reachability. Then we model the map matching process as the Markov decision process, and build up reward function using deflected distance between trajectory points and historic trajectories. The largest reward value of the Markov decision process was calculated by using the reinforced learning algorithm, which is the optimal matching result of trajectory and road. Finally we calibrate the algorithm by utilizing city's floating cars data to experiment. The results show that this algorithm can improve the accuracy between trajectory data and road. The matching accuracy is 89.2% within 1 minute low-frequency sampling interval, and the matching accuracy is 61.4% when the sampling frequency is 16 minutes. Compared with IVVM (Interactive Voting-based Map Matching), HMDP-Q has a higher matching accuracy and computing efficiency. Especially, when the sampling frequency is 16 minutes, HMDP-Q improves the matching accuracy by 26%.

Key words: low-sampling-rate floating car data, trajectory matching, Markov decision process, reinforcement learning

中图分类号: