多维偏好增强型对抗深度强化学习驱动的行人路径规划

doi:10.11947/j.AGCS.2026.20250480

测绘学报 ›› 2026, Vol. 55 ›› Issue (2): 191-205.doi: 10.11947/j.AGCS.2026.20250480

• 空间智能与智慧城市 •

多维偏好增强型对抗深度强化学习驱动的行人路径规划

冉耘博¹(), 杨雪¹(), 周文豪¹, 吴承恩², 周宝定³, 唐炉亮⁴, 李清泉⁵

^1.中国地质大学（武汉）地理与信息工程学院，湖北　武汉　430074
^2.国家知识产权局专利局专利审查协作广东中心，广东　广州　510535
^3.深圳大学广东省城市空间信息工程重点实验室，广东　深圳　518060
^4.武汉大学测绘遥感信息工程全国重点实验室，湖北　武汉　430079
^5.深圳大学空间信息智能感知与服务深圳市重点实验室，广东　深圳　518060

收稿日期:2025-11-13 修回日期:2026-01-05 发布日期:2026-03-13
通讯作者: 杨雪 E-mail:ranyb@cug.edu.cn;yangxue@cug.edu.cn
作者简介:冉耘博（2002—），男，硕士生，研究方向为行人路径规划。 E-mail：ranyb@cug.edu.cn
基金资助:
国家自然科学基金(42271449)

Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning

Yunbo RAN¹(), Xue YANG¹(), Wenhao ZHOU¹, Chengen WU², Baoding ZHOU³, Luliang TANG⁴, Qingquan LI⁵

^1.School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
^2.Patent Examination Cooperation Guangdong Center of the Patent Office, Guangzhou 510535, China
^3.Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Shenzhen 518060, China
^4.State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
^5.Shenzhen Key Laboratory of Spatial Smart Sensing and Services, Shenzhen University, Shenzhen 518060, China

Received:2025-11-13 Revised:2026-01-05 Published:2026-03-13
Contact: Xue YANG E-mail:ranyb@cug.edu.cn;yangxue@cug.edu.cn
About author:RAN Yunbo (2002—), male, postgraduate, majors in pedestrian path planning. E-mail: ranyb@cug.edu.cn
Supported by:
The National Natural Science Foundation of China(42271449)

摘要/Abstract

摘要：

随着智慧城市与精准导航技术的发展，行人路径规划正从单一效率导向转向多维个性化需求驱动，旨在构建融合复杂城市环境与用户偏好的路径规划模型，提供高效灵活的个性化推荐。然而，现有研究仍面临群体差异建模不足、动态偏好机制缺失及复杂场景适应性有限等关键挑战。本文提出一种基于多维偏好建模与对抗深度强化学习的行人路径规划方法。该方法首先构建“情境感知-动态修正”多维偏好模型，为行人路径选择提供动态偏好权重，辅助深度强化学习网络奖励函数重塑，形成“效率-安全-舒适”多目标协同优化机制；然后，构建偏好增强型对抗深度强化学习算法（PEA-DQN），通过引入双经验池预训练策略和自适应训练机制，加速模型收敛并避免冗余计算。试验以武汉市为例，验证PEA-DQN训练得到的模型在混合路网动态干扰情况下的路径规划性能。结果表明：与DQN算法相比，PEA-DQN在路径规划任务中的避障成功率提升超过50%，平均路径长短缩短40.40%；在消融试验中，相较于Dueling DQN，引入多目标奖励函数后的路径质量提升100.4%，自适应机制使算法在动态障碍场景中的计算效率提升40%。PEA-DQN的综合性能显著优于动态A^*算法及其他同类深度强化学习方法。

关键词: 行人路径规划, 深度强化学习, 多维偏好建模, 个性化出行

Abstract:

With the advancement of smart city development and precise navigation technologies, pedestrian path planning research has gradually shifted from a single efficiency-oriented paradigm to one driven by multidimensional and personalized demands. The primary objective is to develop path planning models that account for complex urban environments and individual user preferences, thereby providing efficient, flexible, and personalized route recommendations for pedestrians. However, current research still faces key challenges, including insufficient modeling of group heterogeneity, the absence of dynamic preference mechanisms, and limited adaptability to complex scenarios. To address these issues, This paper proposes a pedestrian path planning method based on multi-dimensional preference modeling and adversarial deep reinforcement learning. The proposed method first constructs a “context-aware and dynamically adaptive” multidimensional preference model, which provides dynamic preference weights for pedestrian route selection. These weights guide the reshaping of the reward function in the deep reinforcement learning framework, enabling a multi-objective collaborative optimization mechanism that balances efficiency, safety, and comfort. Subsequently, a preference-enhanced adversarial deep Q-network algorithm (PEA-DQN) is developed, incorporating a dual-experience replay pretraining strategy and an adaptive training mechanism to accelerate model convergence and reduce redundant computation. Experiments conducted in Wuhan under dynamic disturbances within a mixed urban road network validate the performance of the model trained by PEA-DQN. Compared with the DQN algorithm, PEA-DQN improves obstacle-avoidance success rates by more than 50% and reduces average path length by 40.40%. Ablation studies further demonstrate that, relative to Dueling DQN, the incorporation of a multi-objective reward function improves path quality by 100.4%, while the adaptive mechanism increases computational efficiency by 40% in dynamic obstacle scenarios. Overall, PEA-DQN significantly outperforms dynamic A^* algorithm and other comparable deep reinforcement learning approaches.

Key words: pedestrian path planning, deep reinforcement learning, multi-dimensional preference modeling, personalized mobility

中图分类号:

P208

冉耘博, 杨雪, 周文豪, 吴承恩, 周宝定, 唐炉亮, 李清泉. 多维偏好增强型对抗深度强化学习驱动的行人路径规划[J]. 测绘学报, 2026, 55(2): 191-205.

Yunbo RAN, Xue YANG, Wenhao ZHOU, Chengen WU, Baoding ZHOU, Luliang TANG, Qingquan LI. Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning[J]. Acta Geodaetica et Cartographica Sinica, 2026, 55(2): 191-205.

图/表 19

图1

表1

表2

表3

图2

图3

图4

图5

表4

图6

表5

表6

表7

表8

表9

图7

表10

表11

图8

参考文献 45

[1]	BRAND C, GÖTSCHI T, DONS E, et al. The climate change mitigation impacts of active travel: evidence from a longitudinal panel study in seven European cities[J]. Global Environmental Change, 2021, 67: 102224.
[2]	王鹏龙, 高峰, 黄春林, 等. 面向SDGs的城市可持续发展评价指标体系进展研究[J]. 遥感技术与应用, 2018, 33(5): 784-792.
	WANG Penglong, GAO Feng, HUANG Chunlin, et al. Progress on sustainable city assessment index system for SDGs[J]. Remote Sensing Technology and Application, 2018, 33(5): 784-792.
[3]	BOSSOWSKI J, SZANDAŁA T, MAZURKIEWICZ J. Predicting desire paths: agent-based simulation for neighbourhood route planning[J]. Computers, Environment and Urban Systems, 2025, 117: 102251.
[4]	TRINH T T, VU D M, KIMURA M. A pedestrian path-planning model in accordance with obstacle's danger with reinforcement learning[C]//Proceedings of 2020 International Conference on Information Science and System. New York: ACM Press, 2020: 115-120.
[5]	SUN Huakai, ZHU Kai, ZHANG Weiguang, et al. Emergency path planning based on improved ant colony algorithm[J]. Journal of Building Engineering, 2025, 100: 111725.
[6]	MENNER M, DI CAIRANO S, HAMADA M, et al. MPC-based pedestrian routing for congestion balancing[C]//Proceedings of 2023 IEEE Conference on Control Technology and Applications. Bridgetown: IEEE, 2023: 1089-1094.
[7]	SONG Yuchen, LI Dawei, CAO Qi, et al. The whole day path planning problem incorporating mode chains modeling in the era of mobility as a service[J]. Transportation Research Part C: Emerging Technologies, 2021, 132: 103360.
[8]	EBOLI L, FORCINITI C, MAZZULLA G, et al. Establishing performance criteria for evaluating pedestrian environments[J]. Sustainability, 2023, 15(4): 3523.
[9]	DIJKSTRA E W. A note on two problems in connexion with graphs[J]. Numerische Mathematik, 1959, 1(1): 269-271.
[10]	HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107.
[11]	LAVALLE S. Rapidly-exploring random trees: a new tool for path planning[EB/OL]. [2025-11-05]. https://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf.
[12]	DONG Yuansheng, ZOU Xingjie. Mobile robot path planning based on improved DDPG reinforcement learning algorithm[C]//Proceedings of the 11th International Conference on Software Engineering and Service Science. Beijing: IEEE, 2020: 52-56.
[13]	LI Jianxin, CHEN Yiting, ZHAO Xiuniao, et al. An improved DQN path planning algorithm[J]. The Journal of Supercomputing, 2022, 78(1): 616-639.
[14]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1707.06347.
[15]	HALL C M, RAM Y. Walk Score^® and its potential contribution to the study of active transport and walkability: a critical and systematic review[J]. Transportation Research Part D: Transport and Environment, 2018, 61: 310-324.
[16]	PONZI V, COMITO L, NAPOLI C. PNMLR: enhancing route recommendations with personalized preferences using graph attention networks[J]. IEEE Access, 2025, 13: 57465-57475.
[17]	CAI Kuanqi, CHEN Weinan, DUGAS D, et al. Sampling-based path planning in highly dynamic and crowded pedestrian flow[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14732-14742.
[18]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[19]	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2094-2100.
[20]	WANG Ziyu, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: PMLR, 2016: 1995-2003.
[21]	NAIR A, SRINIVASAN P, BLACKWELL S, et al. Massively parallel methods for deep reinforcement learning[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1507.04296.
[22]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1511.05952.
[23]	LIU Ping, MA Xiangyu, DING Jie, et al. Multi-agent collaborative path planning algorithm with reinforcement learning and combined prioritized experience replay in Internet of Things[J]. Computers and Electrical Engineering, 2024, 116: 109193.
[24]	刘用, 杨晓飞, 夏金铭. 基于模糊算法的AUV避障与姿态控制[J]. 江苏大学学报(自然科学版), 2021, 42(6): 655-660.
	LIU Yong, YANG Xiaofei, XIA Jinming. Obstacle avoidance and attitude control of AUV based on fuzzy algorithm[J]. Journal of Jiangsu University (Natural Science Edition), 2021, 42(6): 655-660.
[25]	BAO Siya, NITTA T, SHINDOU D, et al. A landmark-based route recommendation method for pedestrian walking strategies[C]//Proceedings of the 4th Global Conference on Consumer Electronics. Osaka: IEEE, 2015: 672-673.
[26]	胡松, 吴海俊, 赵慧. 基于层次熵分析法的自行车交通系统评价及应用研究[C]//2016年中国城市交通规划年会论文集. 深圳: 中国城市规划学会城市交通规划学术委员会, 2016.
	HU Song, WU Haijun, ZHAO Hui. Bicycle transportation system evaluation and application based on hierarchical entropy analysis[C]//Proceedings of 2016 China Urban Transport Planning Annual Conference. Shenzhen: Urban Transportation Planning Academic Committee of Urban Planning Society of China, 2016.
[27]	方志祥, 罗浩, 李灵. 有限状态自动机辅助的行人导航状态匹配算法[J]. 测绘学报, 2017, 46(3): 371-380. DOI: . doi: 10.11947/j.AGCS.2017.20160530
	FANG Zhixiang, LUO Hao, LI Ling. A finite state machine aided pedestrian navigation state matching algorithm[J]. Acta Geodaetica et Cartographica Sinica, 2017, 46(3): 371-380. DOI: . doi: 10.11947/j.AGCS.2017.20160530
[28]	方志祥, 王禄斌. 面向行人导航意图探测的脑电分类研究[J]. 测绘学报, 2024, 53(9): 1829-1841. DOI: . doi: 10.11947/j.AGCS.2024.20230444
	FANG Zhixiang, WANG Lubin. Detecting pedestrian intention using EEG signals in navigation[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(9): 1829-1841. DOI: . doi: 10.11947/j.AGCS.2024.20230444
[29]	赵青, 陈勇, 罗斌, 等. 一种融合行人预测信息的局部路径规划算法[J]. 武汉大学学报(信息科学版), 2020, 45(5): 667-675.
	ZHAO Qing, CHEN Yong, LUO Bin, et al. A local path planning algorithm based on pedestrian prediction information[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 667-675.
[30]	吴文静, 王占中, 马芳武. 从众心理影响下的行人群体行为演化博弈的仿真分析：以行人过街为例[J]. 吉林大学学报(工学版), 2017, 47(1): 92-96.
	WU Wenjing, WANG Zhanzhong, MA Fangwu. Simulation analysis of evolutionary game of pedestrians' group behaviors under influence of herd behavior: in case of crossing behavior[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(1): 92-96.
[31]	HU Xuemin, CHEN Long, TANG Bo, et al. Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles[J]. Mechanical Systems and Signal Processing, 2018, 100: 482-500.
[32]	YANG Xue, STEWART K, FANG Mengyuan, et al. Attributing pedestrian networks with semantic information based on multi-source spatial data[J]. International Journal of Geographical Information Science, 2022, 36(1): 31-54.
[33]	HASSAN L M, SHIU E, PARRY S. Addressing the cross-country applicability of the theory of planned behaviour (TPB): a structured review of multi-country TPB studies[J]. Journal of Consumer Behaviour, 2016, 15(1): 72-86.
[34]	王姣娥, 杜方叶, 靳海涛, 等. 基于交通出行链的就医活动识别理论框架与方法体系[J]. 地球信息科学学报, 2020, 22(4): 805-815.
	WANG Jiao'e, DU Fangye, JIN Haitao, et al. Identifying hospital-seeking behavior based on trip chain data: theoretical framework and methodological system[J]. Journal of Geo-information Science, 2020, 22(4): 805-815.
[35]	CHEN Jun, YANG Dongyuan. Estimating smart card commuters origin-destination distribution based on APTS data[J]. Journal of Transportation Systems Engineering and Information Technology, 2013, 13(4): 47-53.
[36]	刘丽敏, 虞虎, 靳海涛. 基于公交刷卡数据的北京城市居民周末户外休闲行为特征研究[J]. 地域研究与开发, 2018, 37(6): 52-57.
	LIU Limin, YU Hu, JIN Haitao. Characteristics of outdoor recreation behaviors of Beijing residents on weekends based on public transportation data[J]. Areal Research and Development, 2018, 37(6): 52-57.
[37]	SONG Yuchen, LI Dawei, LIU Dongjie, et al. Modeling activity-travel behavior under a dynamic discrete choice framework with unobserved heterogeneity[J]. Transportation Research Part E: Logistics and Transportation Review, 2022, 167: 102914.
[38]	代维秀, 陈占龙, 谢鹏. 居民出行与轨迹行为交互模式挖掘与关联技术[J]. 测绘学报, 2021, 50(4): 532-543. DOI: . doi: 10.11947/j.AGCS.2021.20200072
	DAI Weixiu, CHEN Zhanlong, XIE Peng. Research on the interactive mode of residents' behavior based on trajectory data mining[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(4): 532-543. DOI: . doi: 10.11947/j.AGCS.2021.20200072
[39]	LUDERS B, KOTHARI M, HOW J. Chance constrained RRT for probabilistic robustness to environmental uncertainty[C]//Proceedings of 2010 AIAA Guidance, Navigation, and Control Conference. Toronto: AIAA, 2010.
[40]	ZHANG Jun, SEYFRIED A. Quantification of bottleneck effects for different types of facilities[J]. Transportation Research Procedia, 2014, 2: 51-59.
[41]	RAHMAN K, ABDUL G N, ABDULBASAH K A, et al. Modelling pedestrian travel time and the design of facilities: a queuing approach[J]. PLoS One, 2013, 8(5): e63503.
[42]	龙瀛, 赵健婷, 李双金, 等. 中国主要城市街道步行指数的大规模测度[J]. 新建筑, 2018(3): 4-8.
	LONG Ying, ZHAO Jianting, LI Shuangjin, et al. The large-scale calculation of “walk score” of main cities in China[J]. New Architecture, 2018(3): 4-8.
[43]	GENG Yuanzhe, LIU Erwu, WANG Rui, et al. Deep reinforcement learning based dynamic route planning for minimizing travel time[C]//Proceedings of 2021 IEEE International Conference on Communications Workshops. Montreal: IEEE, 2021.
[44]	LI Xin, WANG Lei, AN Yi, et al. Dynamic path planning of mobile robots using adaptive dynamic programming[J]. Expert Systems with Applications, 2024, 235: 121112.
[45]	STENTZ A. Optimal and efficient path planning for partially-known environments[C]//Proceedings of 1994 IEEE International Conference on Robotics and Automation. San Diego: IEEE, 1994: 3310-3317.

群体	目的	效率	安全	舒适	设定依据
普通	通勤	0.70	0.10	0.20	通勤用户最关注效率
	休闲	0.40	0.30	0.30	休闲场景需平衡效率与景观体验
	紧急	0.50	0.40	0.10	紧急场景需兼顾效率与基本安全
轮椅	通勤	0.50	0.30	0.20	通勤路径需满足坡度小
	休闲	0.30	0.20	0.50	休闲时优先选无障碍
	紧急	0.20	0.60	0.20	紧急路径需无障碍
视障	通勤	0.30	0.50	0.20	通勤路径盲道覆盖率
	休闲	0.10	0.60	0.30	视障需高安全感
	紧急	0.10	0.80	0.10	紧急时安全权重最大化

功能区	通勤概率	休闲概率	紧急概率	逻辑依据
工业区	0.90	0.05	0.05	通勤需求占绝对主导
商业区	0.40	0.55	0.05	休闲场景高频
文教区	0.75	0.20	0.05	学校科研通勤集中
居民区	0.30	0.65	0.05	社区活动与日常为主
公园	0.00	0.95	0.05	散步等休闲行为主导
紧急区	0.00	0.00	1.00	医院、警局等触发

群体	障碍物敏感度（γ_obs）	拥挤敏感度（γ_cong）	坡度敏感度（γ_slope）
普通	0.3	0.5	0.2
轮椅	0.6	0.4	0.8
视障	0.9	0.2	0.3

地区	面积/km²	节点数	道路数	位置
工业区	2.770 2	70	78	光谷工业园
商业区	3.442 9	83	113	武商广场
文教区	2.111 5	108	154	武汉大学信息学部
居民区	2.143 3	126	168	宝安璞园
公园	2.062 7	65	98	中山公园

区域	面积/km²	道路节点	路段总数
A	4.648 7	772	1088
B	10.721 6	568	881
C	11.573 9	869	1233

多维偏好增强型对抗深度强化学习驱动的行人路径规划

Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 45

相关文章 1

编辑推荐

Metrics

本文评价

参数	值	描述
动作空间大小	4	行人可选的动作数
学习率α	0.001	神经网络的学习率
衰减因子γ	0.9	累计奖励的衰减因子
探索率ε	0.2	探索率
经验回放存储H	10 000	存储历史经验数据
样本大小D_min	20	提取的经验数据大小
激活函数	ReLU	神经元激活函数
目标网络更新频率	300	目标网络的更新频率
最大训练轮数	20	模型最大训练轮数
训练步数t	—	每一轮训练步数
路径相似度阈值P	—	最小路径相似度

训练步数	每轮时间/s	平均路径质量	平均收敛时间/s
100	14.051 4	134.513	54.790 5
500	14.235 2	142.548	58.036 9
1000	15.741 7	149.256	50.568 8
2000	19.646 1	150.418	70.134 4

评价指标	算法	居民区	商业区	文教区	工业区	公园
平均规划路径长度/m	Dueling DQN	3 462.269	4 470.144	2 159.692	4 582.358	3 845.664
	MPM驱动DQN	2 241.205	2793.158	1 262.568	2 793.393	2 109.751
	双经验池预训练DQN	2 941.685	3 493.524	1 762.279	3 243.817	2 809.033
	添加自适应训练机制的DQN	3 398.741	4 392.295	2 068.629	3 981.762	3 546.965
	PEA-DQN	2 144.016	2 615.509	1 147.134	2 759.692	2 005.342
平均规划时间成本/s	Dueling DQN	60.489	59.418	71.169	58.217	64.42
	MPM驱动DQN	59.676	60.821	70.237	59.865	63.972
	双经验池预训练DQN	54.854	53.463	61.489	50.148	57.323
	添加自适应训练机制的DQN	37.545	34.185	40.659	33.356	37.197
	PEA-DQN	36.199	32.579	37.188	32.179	34.332
平均规划路径质量	Dueling DQN	54.3	44.8	72.1	63.3	65.1
	MPM驱动DQN	107.4	109.2	122.5	106.5	119.7
	双经验池预训练DQN	75.1	78.4	84.4	71.1	87.4
	添加自适应训练机制的DQN	54.2	44.6	71.1	62.1	63.6
	PEA-DQN	110.3	116.3	128.5	105.2	124.3

评价指标	算法	居民区	商业区	文教区	工业区	公园
平均规划路径长度/m	DQN	4 802.587	5 891.232	3 025.484	5 229.813	4 668.945
	Double DQN	4 549.257	5 023.25	2 665.437	4 597.613	4 507.143
	Dueling DQN	3642.588	4 670.324	2 374.841	4 632.763	3 845.636
	动态A^*	2 434.744	2 883.451	1 223.869	3 028.202	2 635.652
	PEA-DQN	2 821.796	3 420.259	1 412.304	3 540.384	2 881.209
平均规划时间成本/s	DQN	174.872	151.217	170.22	163.254	154.361
	Double DQN	137.106	125.536	130.305	123.255	125.693
	Dueling DQN	134.74	130.084	126.978	114.636	129.507
	动态A^*	18.547	22.203	19.681	21.452	19.622
	PEA-DQN	52.727	48.445	57.083	49.268	54.872
平均规划路径质量	DQN	38.5	31.3	49.9	49.6	51.6
	Double DQN	49.1	37.5	68.1	59.5	55.5
	Dueling DQN	51.7	42.1	69.1	61.5	60.7
	动态A^*	61.9	58.2	75.3	69.3	72.7
	PEA-DQN	89.3	88.5	107.2	85.9	100.9

群体	群体识别准确率	目的识别准确率	避障成功率
普通	87.6	85.3	89.2
轮椅	85.4	83.9	87.5
视障	82.1	79.8	83.4
整体均值	84.7	83.0	86.7