测绘学报 ›› 2026, Vol. 55 ›› Issue (2): 191-205.doi: 10.11947/j.AGCS.2026.20250480

• 空间智能与智慧城市 •    

多维偏好增强型对抗深度强化学习驱动的行人路径规划

冉耘博1(), 杨雪1(), 周文豪1, 吴承恩2, 周宝定3, 唐炉亮4, 李清泉5   

  1. 1.中国地质大学(武汉)地理与信息工程学院,湖北 武汉 430074
    2.国家知识产权局专利局专利审查协作广东中心,广东 广州 510535
    3.深圳大学广东省城市空间信息工程重点实验室,广东 深圳 518060
    4.武汉大学测绘遥感信息工程全国重点实验室,湖北 武汉 430079
    5.深圳大学空间信息智能感知与服务深圳市重点实验室,广东 深圳 518060
  • 收稿日期:2025-11-13 修回日期:2026-01-05 发布日期:2026-03-13
  • 通讯作者: 杨雪 E-mail:ranyb@cug.edu.cn;yangxue@cug.edu.cn
  • 作者简介:冉耘博(2002—),男,硕士生,研究方向为行人路径规划。 E-mail:ranyb@cug.edu.cn
  • 基金资助:
    国家自然科学基金(42271449)

Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning

Yunbo RAN1(), Xue YANG1(), Wenhao ZHOU1, Chengen WU2, Baoding ZHOU3, Luliang TANG4, Qingquan LI5   

  1. 1.School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
    2.Patent Examination Cooperation Guangdong Center of the Patent Office, Guangzhou 510535, China
    3.Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Shenzhen 518060, China
    4.State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
    5.Shenzhen Key Laboratory of Spatial Smart Sensing and Services, Shenzhen University, Shenzhen 518060, China
  • Received:2025-11-13 Revised:2026-01-05 Published:2026-03-13
  • Contact: Xue YANG E-mail:ranyb@cug.edu.cn;yangxue@cug.edu.cn
  • About author:RAN Yunbo (2002—), male, postgraduate, majors in pedestrian path planning. E-mail: ranyb@cug.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42271449)

摘要:

随着智慧城市与精准导航技术的发展,行人路径规划正从单一效率导向转向多维个性化需求驱动,旨在构建融合复杂城市环境与用户偏好的路径规划模型,提供高效灵活的个性化推荐。然而,现有研究仍面临群体差异建模不足、动态偏好机制缺失及复杂场景适应性有限等关键挑战。本文提出一种基于多维偏好建模与对抗深度强化学习的行人路径规划方法。该方法首先构建“情境感知-动态修正”多维偏好模型,为行人路径选择提供动态偏好权重,辅助深度强化学习网络奖励函数重塑,形成“效率-安全-舒适”多目标协同优化机制;然后,构建偏好增强型对抗深度强化学习算法(PEA-DQN),通过引入双经验池预训练策略和自适应训练机制,加速模型收敛并避免冗余计算。试验以武汉市为例,验证PEA-DQN训练得到的模型在混合路网动态干扰情况下的路径规划性能。结果表明:与DQN算法相比,PEA-DQN在路径规划任务中的避障成功率提升超过50%,平均路径长短缩短40.40%;在消融试验中,相较于Dueling DQN,引入多目标奖励函数后的路径质量提升100.4%,自适应机制使算法在动态障碍场景中的计算效率提升40%。PEA-DQN的综合性能显著优于动态A*算法及其他同类深度强化学习方法。

关键词: 行人路径规划, 深度强化学习, 多维偏好建模, 个性化出行

Abstract:

With the advancement of smart city development and precise navigation technologies, pedestrian path planning research has gradually shifted from a single efficiency-oriented paradigm to one driven by multidimensional and personalized demands. The primary objective is to develop path planning models that account for complex urban environments and individual user preferences, thereby providing efficient, flexible, and personalized route recommendations for pedestrians. However, current research still faces key challenges, including insufficient modeling of group heterogeneity, the absence of dynamic preference mechanisms, and limited adaptability to complex scenarios. To address these issues, This paper proposes a pedestrian path planning method based on multi-dimensional preference modeling and adversarial deep reinforcement learning. The proposed method first constructs a “context-aware and dynamically adaptive” multidimensional preference model, which provides dynamic preference weights for pedestrian route selection. These weights guide the reshaping of the reward function in the deep reinforcement learning framework, enabling a multi-objective collaborative optimization mechanism that balances efficiency, safety, and comfort. Subsequently, a preference-enhanced adversarial deep Q-network algorithm (PEA-DQN) is developed, incorporating a dual-experience replay pretraining strategy and an adaptive training mechanism to accelerate model convergence and reduce redundant computation. Experiments conducted in Wuhan under dynamic disturbances within a mixed urban road network validate the performance of the model trained by PEA-DQN. Compared with the DQN algorithm, PEA-DQN improves obstacle-avoidance success rates by more than 50% and reduces average path length by 40.40%. Ablation studies further demonstrate that, relative to Dueling DQN, the incorporation of a multi-objective reward function improves path quality by 100.4%, while the adaptive mechanism increases computational efficiency by 40% in dynamic obstacle scenarios. Overall, PEA-DQN significantly outperforms dynamic A* algorithm and other comparable deep reinforcement learning approaches.

Key words: pedestrian path planning, deep reinforcement learning, multi-dimensional preference modeling, personalized mobility

中图分类号: