Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning

doi:10.11947/j.AGCS.2026.20250480

Abstract

Abstract:

With the advancement of smart city development and precise navigation technologies, pedestrian path planning research has gradually shifted from a single efficiency-oriented paradigm to one driven by multidimensional and personalized demands. The primary objective is to develop path planning models that account for complex urban environments and individual user preferences, thereby providing efficient, flexible, and personalized route recommendations for pedestrians. However, current research still faces key challenges, including insufficient modeling of group heterogeneity, the absence of dynamic preference mechanisms, and limited adaptability to complex scenarios. To address these issues, This paper proposes a pedestrian path planning method based on multi-dimensional preference modeling and adversarial deep reinforcement learning. The proposed method first constructs a “context-aware and dynamically adaptive” multidimensional preference model, which provides dynamic preference weights for pedestrian route selection. These weights guide the reshaping of the reward function in the deep reinforcement learning framework, enabling a multi-objective collaborative optimization mechanism that balances efficiency, safety, and comfort. Subsequently, a preference-enhanced adversarial deep Q-network algorithm (PEA-DQN) is developed, incorporating a dual-experience replay pretraining strategy and an adaptive training mechanism to accelerate model convergence and reduce redundant computation. Experiments conducted in Wuhan under dynamic disturbances within a mixed urban road network validate the performance of the model trained by PEA-DQN. Compared with the DQN algorithm, PEA-DQN improves obstacle-avoidance success rates by more than 50% and reduces average path length by 40.40%. Ablation studies further demonstrate that, relative to Dueling DQN, the incorporation of a multi-objective reward function improves path quality by 100.4%, while the adaptive mechanism increases computational efficiency by 40% in dynamic obstacle scenarios. Overall, PEA-DQN significantly outperforms dynamic A^* algorithm and other comparable deep reinforcement learning approaches.

Key words: pedestrian path planning, deep reinforcement learning, multi-dimensional preference modeling, personalized mobility

CLC Number:

P208

Yunbo RAN, Xue YANG, Wenhao ZHOU, Chengen WU, Baoding ZHOU, Luliang TANG, Qingquan LI. Pedestrian path planning driven by preference-enhanced adversarial deep reinforcement learning[J]. Acta Geodaetica et Cartographica Sinica, 2026, 55(2): 191-205.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: http://xb.chinasmp.com/EN/10.11947/j.AGCS.2026.20250480

http://xb.chinasmp.com/EN/Y2026/V55/I2/191

Figures/Tables 19

Fig. 1

Tab. 1

Tab. 2

Tab. 3

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Tab. 4

Fig. 6

Tab. 5

Tab. 6

Tab. 7

Tab. 8

Tab. 9

Fig. 7

Tab. 10

Tab. 11

Fig. 8

References 45

[1]	BRAND C, GÖTSCHI T, DONS E, et al. The climate change mitigation impacts of active travel: evidence from a longitudinal panel study in seven European cities[J]. Global Environmental Change, 2021, 67: 102224.
[2]	王鹏龙, 高峰, 黄春林, 等. 面向SDGs的城市可持续发展评价指标体系进展研究[J]. 遥感技术与应用, 2018, 33(5): 784-792.
	WANG Penglong, GAO Feng, HUANG Chunlin, et al. Progress on sustainable city assessment index system for SDGs[J]. Remote Sensing Technology and Application, 2018, 33(5): 784-792.
[3]	BOSSOWSKI J, SZANDAŁA T, MAZURKIEWICZ J. Predicting desire paths: agent-based simulation for neighbourhood route planning[J]. Computers, Environment and Urban Systems, 2025, 117: 102251.
[4]	TRINH T T, VU D M, KIMURA M. A pedestrian path-planning model in accordance with obstacle's danger with reinforcement learning[C]//Proceedings of 2020 International Conference on Information Science and System. New York: ACM Press, 2020: 115-120.
[5]	SUN Huakai, ZHU Kai, ZHANG Weiguang, et al. Emergency path planning based on improved ant colony algorithm[J]. Journal of Building Engineering, 2025, 100: 111725.
[6]	MENNER M, DI CAIRANO S, HAMADA M, et al. MPC-based pedestrian routing for congestion balancing[C]//Proceedings of 2023 IEEE Conference on Control Technology and Applications. Bridgetown: IEEE, 2023: 1089-1094.
[7]	SONG Yuchen, LI Dawei, CAO Qi, et al. The whole day path planning problem incorporating mode chains modeling in the era of mobility as a service[J]. Transportation Research Part C: Emerging Technologies, 2021, 132: 103360.
[8]	EBOLI L, FORCINITI C, MAZZULLA G, et al. Establishing performance criteria for evaluating pedestrian environments[J]. Sustainability, 2023, 15(4): 3523.
[9]	DIJKSTRA E W. A note on two problems in connexion with graphs[J]. Numerische Mathematik, 1959, 1(1): 269-271.
[10]	HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107.
[11]	LAVALLE S. Rapidly-exploring random trees: a new tool for path planning[EB/OL]. [2025-11-05]. https://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf.
[12]	DONG Yuansheng, ZOU Xingjie. Mobile robot path planning based on improved DDPG reinforcement learning algorithm[C]//Proceedings of the 11th International Conference on Software Engineering and Service Science. Beijing: IEEE, 2020: 52-56.
[13]	LI Jianxin, CHEN Yiting, ZHAO Xiuniao, et al. An improved DQN path planning algorithm[J]. The Journal of Supercomputing, 2022, 78(1): 616-639.
[14]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1707.06347.
[15]	HALL C M, RAM Y. Walk Score^® and its potential contribution to the study of active transport and walkability: a critical and systematic review[J]. Transportation Research Part D: Transport and Environment, 2018, 61: 310-324.
[16]	PONZI V, COMITO L, NAPOLI C. PNMLR: enhancing route recommendations with personalized preferences using graph attention networks[J]. IEEE Access, 2025, 13: 57465-57475.
[17]	CAI Kuanqi, CHEN Weinan, DUGAS D, et al. Sampling-based path planning in highly dynamic and crowded pedestrian flow[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14732-14742.
[18]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[19]	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2094-2100.
[20]	WANG Ziyu, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: PMLR, 2016: 1995-2003.
[21]	NAIR A, SRINIVASAN P, BLACKWELL S, et al. Massively parallel methods for deep reinforcement learning[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1507.04296.
[22]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2025-11-05]. https://arxiv.org/pdf/1511.05952.
[23]	LIU Ping, MA Xiangyu, DING Jie, et al. Multi-agent collaborative path planning algorithm with reinforcement learning and combined prioritized experience replay in Internet of Things[J]. Computers and Electrical Engineering, 2024, 116: 109193.
[24]	刘用, 杨晓飞, 夏金铭. 基于模糊算法的AUV避障与姿态控制[J]. 江苏大学学报(自然科学版), 2021, 42(6): 655-660.
	LIU Yong, YANG Xiaofei, XIA Jinming. Obstacle avoidance and attitude control of AUV based on fuzzy algorithm[J]. Journal of Jiangsu University (Natural Science Edition), 2021, 42(6): 655-660.
[25]	BAO Siya, NITTA T, SHINDOU D, et al. A landmark-based route recommendation method for pedestrian walking strategies[C]//Proceedings of the 4th Global Conference on Consumer Electronics. Osaka: IEEE, 2015: 672-673.
[26]	胡松, 吴海俊, 赵慧. 基于层次熵分析法的自行车交通系统评价及应用研究[C]//2016年中国城市交通规划年会论文集. 深圳: 中国城市规划学会城市交通规划学术委员会, 2016.
	HU Song, WU Haijun, ZHAO Hui. Bicycle transportation system evaluation and application based on hierarchical entropy analysis[C]//Proceedings of 2016 China Urban Transport Planning Annual Conference. Shenzhen: Urban Transportation Planning Academic Committee of Urban Planning Society of China, 2016.
[27]	方志祥, 罗浩, 李灵. 有限状态自动机辅助的行人导航状态匹配算法[J]. 测绘学报, 2017, 46(3): 371-380. DOI: . doi: 10.11947/j.AGCS.2017.20160530
	FANG Zhixiang, LUO Hao, LI Ling. A finite state machine aided pedestrian navigation state matching algorithm[J]. Acta Geodaetica et Cartographica Sinica, 2017, 46(3): 371-380. DOI: . doi: 10.11947/j.AGCS.2017.20160530
[28]	方志祥, 王禄斌. 面向行人导航意图探测的脑电分类研究[J]. 测绘学报, 2024, 53(9): 1829-1841. DOI: . doi: 10.11947/j.AGCS.2024.20230444
	FANG Zhixiang, WANG Lubin. Detecting pedestrian intention using EEG signals in navigation[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(9): 1829-1841. DOI: . doi: 10.11947/j.AGCS.2024.20230444
[29]	赵青, 陈勇, 罗斌, 等. 一种融合行人预测信息的局部路径规划算法[J]. 武汉大学学报(信息科学版), 2020, 45(5): 667-675.
	ZHAO Qing, CHEN Yong, LUO Bin, et al. A local path planning algorithm based on pedestrian prediction information[J]. Geomatics and Information Science of Wuhan University, 2020, 45(5): 667-675.
[30]	吴文静, 王占中, 马芳武. 从众心理影响下的行人群体行为演化博弈的仿真分析：以行人过街为例[J]. 吉林大学学报(工学版), 2017, 47(1): 92-96.
	WU Wenjing, WANG Zhanzhong, MA Fangwu. Simulation analysis of evolutionary game of pedestrians' group behaviors under influence of herd behavior: in case of crossing behavior[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(1): 92-96.
[31]	HU Xuemin, CHEN Long, TANG Bo, et al. Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles[J]. Mechanical Systems and Signal Processing, 2018, 100: 482-500.
[32]	YANG Xue, STEWART K, FANG Mengyuan, et al. Attributing pedestrian networks with semantic information based on multi-source spatial data[J]. International Journal of Geographical Information Science, 2022, 36(1): 31-54.
[33]	HASSAN L M, SHIU E, PARRY S. Addressing the cross-country applicability of the theory of planned behaviour (TPB): a structured review of multi-country TPB studies[J]. Journal of Consumer Behaviour, 2016, 15(1): 72-86.
[34]	王姣娥, 杜方叶, 靳海涛, 等. 基于交通出行链的就医活动识别理论框架与方法体系[J]. 地球信息科学学报, 2020, 22(4): 805-815.
	WANG Jiao'e, DU Fangye, JIN Haitao, et al. Identifying hospital-seeking behavior based on trip chain data: theoretical framework and methodological system[J]. Journal of Geo-information Science, 2020, 22(4): 805-815.
[35]	CHEN Jun, YANG Dongyuan. Estimating smart card commuters origin-destination distribution based on APTS data[J]. Journal of Transportation Systems Engineering and Information Technology, 2013, 13(4): 47-53.
[36]	刘丽敏, 虞虎, 靳海涛. 基于公交刷卡数据的北京城市居民周末户外休闲行为特征研究[J]. 地域研究与开发, 2018, 37(6): 52-57.
	LIU Limin, YU Hu, JIN Haitao. Characteristics of outdoor recreation behaviors of Beijing residents on weekends based on public transportation data[J]. Areal Research and Development, 2018, 37(6): 52-57.
[37]	SONG Yuchen, LI Dawei, LIU Dongjie, et al. Modeling activity-travel behavior under a dynamic discrete choice framework with unobserved heterogeneity[J]. Transportation Research Part E: Logistics and Transportation Review, 2022, 167: 102914.
[38]	代维秀, 陈占龙, 谢鹏. 居民出行与轨迹行为交互模式挖掘与关联技术[J]. 测绘学报, 2021, 50(4): 532-543. DOI: . doi: 10.11947/j.AGCS.2021.20200072
	DAI Weixiu, CHEN Zhanlong, XIE Peng. Research on the interactive mode of residents' behavior based on trajectory data mining[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(4): 532-543. DOI: . doi: 10.11947/j.AGCS.2021.20200072
[39]	LUDERS B, KOTHARI M, HOW J. Chance constrained RRT for probabilistic robustness to environmental uncertainty[C]//Proceedings of 2010 AIAA Guidance, Navigation, and Control Conference. Toronto: AIAA, 2010.
[40]	ZHANG Jun, SEYFRIED A. Quantification of bottleneck effects for different types of facilities[J]. Transportation Research Procedia, 2014, 2: 51-59.
[41]	RAHMAN K, ABDUL G N, ABDULBASAH K A, et al. Modelling pedestrian travel time and the design of facilities: a queuing approach[J]. PLoS One, 2013, 8(5): e63503.
[42]	龙瀛, 赵健婷, 李双金, 等. 中国主要城市街道步行指数的大规模测度[J]. 新建筑, 2018(3): 4-8.
	LONG Ying, ZHAO Jianting, LI Shuangjin, et al. The large-scale calculation of “walk score” of main cities in China[J]. New Architecture, 2018(3): 4-8.
[43]	GENG Yuanzhe, LIU Erwu, WANG Rui, et al. Deep reinforcement learning based dynamic route planning for minimizing travel time[C]//Proceedings of 2021 IEEE International Conference on Communications Workshops. Montreal: IEEE, 2021.
[44]	LI Xin, WANG Lei, AN Yi, et al. Dynamic path planning of mobile robots using adaptive dynamic programming[J]. Expert Systems with Applications, 2024, 235: 121112.
[45]	STENTZ A. Optimal and efficient path planning for partially-known environments[C]//Proceedings of 1994 IEEE International Conference on Robotics and Automation. San Diego: IEEE, 1994: 3310-3317.

群体	目的	效率	安全	舒适	设定依据
普通	通勤	0.70	0.10	0.20	通勤用户最关注效率
	休闲	0.40	0.30	0.30	休闲场景需平衡效率与景观体验
	紧急	0.50	0.40	0.10	紧急场景需兼顾效率与基本安全
轮椅	通勤	0.50	0.30	0.20	通勤路径需满足坡度小
	休闲	0.30	0.20	0.50	休闲时优先选无障碍
	紧急	0.20	0.60	0.20	紧急路径需无障碍
视障	通勤	0.30	0.50	0.20	通勤路径盲道覆盖率
	休闲	0.10	0.60	0.30	视障需高安全感
	紧急	0.10	0.80	0.10	紧急时安全权重最大化

功能区	通勤概率	休闲概率	紧急概率	逻辑依据
工业区	0.90	0.05	0.05	通勤需求占绝对主导
商业区	0.40	0.55	0.05	休闲场景高频
文教区	0.75	0.20	0.05	学校科研通勤集中
居民区	0.30	0.65	0.05	社区活动与日常为主
公园	0.00	0.95	0.05	散步等休闲行为主导
紧急区	0.00	0.00	1.00	医院、警局等触发

群体	障碍物敏感度（γ_obs）	拥挤敏感度（γ_cong）	坡度敏感度（γ_slope）
普通	0.3	0.5	0.2
轮椅	0.6	0.4	0.8
视障	0.9	0.2	0.3

地区	面积/km²	节点数	道路数	位置
工业区	2.770 2	70	78	光谷工业园
商业区	3.442 9	83	113	武商广场
文教区	2.111 5	108	154	武汉大学信息学部
居民区	2.143 3	126	168	宝安璞园
公园	2.062 7	65	98	中山公园

区域	面积/km²	道路节点	路段总数
A	4.648 7	772	1088
B	10.721 6	568	881
C	11.573 9	869	1233