基于混合智能的街景影像知识提取方法

doi:10.11947/j.AGCS.2024.20220720

测绘学报 ›› 2024, Vol. 53 ›› Issue (9): 1817-1828.doi: 10.11947/j.AGCS.2024.20220720

• 地图学与地理信息 • 上一篇

基于混合智能的街景影像知识提取方法

刘万增¹^,²^,³(), 陈杭²^,⁴, 任加新²^,⁵(), 张兆江⁴, 李然¹^,²^,³, 赵婷婷¹^,²^,³, 翟曦¹^,²^,³, 朱秀丽¹^,²^,³

^1.国家基础地理信息中心，北京　100830
^2.自然资源部时空信息与智能服务重点实验室，北京　100830
^3.湖北珞珈实验室，湖北　武汉　430079
^4.河北工程大学矿业与测绘工程学院，河北　邯郸　056038
^5.中南大学地球科学与信息物理学院，湖南　长沙　410083

收稿日期:2022-12-31 发布日期:2024-10-16
通讯作者: 任加新 E-mail:luwnzg@163.com;jaycecd@foxmail.com
作者简介:刘万增（1970—），男，博士，教授级高级工程师，研究方向为时空知识服务。E-mail：luwnzg@163.com
基金资助:
国家自然科学基金(42394062);国家重点研发计划(2022YFB3904205);湖北珞珈实验室开放基金资助项目(220100037)

Research on knowledge extraction from street scene images based on hybrid intelligence

Wanzeng LIU¹^,²^,³(), Hang CHEN²^,⁴, Jiaxin REN²^,⁵(), Zhaojiang ZHANG⁴, Ran LI¹^,²^,³, Tingting ZHAO¹^,²^,³, Xi ZHAI¹^,²^,³, Xiuli ZHU¹^,²^,³

^1.National Geomatics Center of China, Beijing 100830, China
^2.Key Laboratory of Spatio-temporal Information and Intelligent Services, Ministry of Natural Resources of China, Beijing 100830, China
^3.Hubei Luojia Laboratory, Wuhan 430079, China
^4.College of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, China
^5.School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

Received:2022-12-31 Published:2024-10-16
Contact: Jiaxin REN E-mail:luwnzg@163.com;jaycecd@foxmail.com
About author:LIU Wanzeng (1970—), male, PhD, professorate senior engineer, majors in spatio-temporal knowledge service. E-mail: luwnzg@163.com
Supported by:
The National Natural Science Foundation of China(42394062);The National Key Research and Development Program of China(2022YFB3904205);The Open Fund of Hubei Luojia Laboratory(220100037)

摘要/Abstract

摘要：

针对街景影像目标的智能化提取难题，本文提出了一种基于混合智能的街景影像知识提取方法（K-CAPSNet）。首先，在现有全景分割网络的基础上，同时关注街景影像的通道信息和空间信息，发展了一种联合注意力机制的全景分割网络，以提高目标分割精度；其次，将人们在生产、生活中形成的街景知识融入街景影像认知过程，借助先验知识设置目标标记阈值，对分割结果进行优化；然后，进一步根据街景影像先验知识验证街景目标之间的拓扑关系并利用深度信息进行空间关系知识挖掘；最后，采用语义模板对街景目标类型、数量及空间关系进行描述和表达。试验表明，相较于基线网络，本文方法在全景分割质量和识别质量方面都有明显提升，较好地实现了对街景影像知识的提取与表达。

关键词: 混合智能, 先验知识, 全景分割, 场景认知, 注意力机制, 空间关系

Abstract:

This study presents a hybrid intelligence-based approach, named K-CAPSNet, for extracting knowledge from streetscape images. To tackle the challenge of intelligent extraction of streetscape image objects, we develop a panoramic segmentation network with a joint attention mechanism that integrates both channel information and spatial information of streetscape images. This improves the object segmentation accuracy. Additionally, we incorporate streetscape knowledge, which is formed by people in production and life, into the streetscape image cognition process. We set the object marking threshold using a priori knowledge to optimize the segmentation results. Moreover, we utilize the a priori knowledge of streetscape images to verify the topological relationship between streetscape objects and to mine spatial relationship knowledge using depth information. Finally, we employ semantic templates to describe and express the type, number, and spatial relationship between streetscape objects. The experimental results demonstrate that our method outperforms the baseline network and significantly improves the quality of panoramic segmentation and recognition, thereby achieving better extraction and expression of the knowledge of streetscape images.

Key words: hybrid intelligence, prior knowledge, panoptic segmentation, scene cognition, attentional mechanisms, spatial relationships

中图分类号:

P208

刘万增, 陈杭, 任加新, 张兆江, 李然, 赵婷婷, 翟曦, 朱秀丽. 基于混合智能的街景影像知识提取方法[J]. 测绘学报, 2024, 53(9): 1817-1828.

Wanzeng LIU, Hang CHEN, Jiaxin REN, Zhaojiang ZHANG, Ran LI, Tingting ZHAO, Xi ZHAI, Xiuli ZHU. Research on knowledge extraction from street scene images based on hybrid intelligence[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(9): 1817-1828.

图/表 16

图1

图2

图3

表1

图4

图5

表2

图6

表3

表4

图7

图8

图9

图10

表5

参考文献 43

[1]	刘万增, 陈军, 翟曦, 等. 时空知识中心的研究进展与应用[J]. 测绘学报, 2021, 50(9):1183-1193. DOI: 10.11947/j.AGCS.2021.20210160.
	LIU Wanzeng, CHEN Jun, ZHAI Xi, et al. Research progress and application of spatiotemporal knowledge center[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(9):1183-1193. DOI: 10.11947/j.AGCS.2021.20210160.
[2]	YING A O, PENGLONG L I, LI W, et al. Fully convolutional networks for street furniture identification in panorama images[J]. Journal of Geodesy and Geoinformation Science, 2022, 5(4):59-71.
[3]	GUSTAFSSON F K, DANELLJAN M, SCHON T B. Evaluating scalable Bayesian deep learning methods for robust computer vision[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 1289-1298.
[4]	ZUO Z, ZHANG W, ZHANG D. A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields[J]. Journal of Geodesy and Geoinformation Science, 2020, 3(3):39-49.
[5]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[6]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
[7]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-09-20]. https://arxiv.org/abs/1804.02767v1.
[8]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[9]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-08-17]. https://arxiv.org/abs/2004.10934v1.
[10]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[11]	HE Kaiming, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969.
[12]	KIRILLOV A, HE Kaiming, GIRSHICK R, et al. Panoptic segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9404-9413.
[13]	JOHNSON J, KRISHNA R, STARK M, et al. Image retrieval using scene graphs[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3668-3678.
[14]	CHANG X, REN P, XU P, et al. Scene graphs: a survey of generations and applications[EB/OL]. [2024-03-17]http://arxiv.org/abs/2104.01111v1.
[15]	TENG Yao, WANG Limin. Structured sparse R-CNN for direct scene graph generation[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 19415-19424.
[16]	YANG J, LU J, LEE S, et al. Graph R-CNN for scene graph generation[C]//Proceedings of 2018 European conference on computer vision (ECCV). Munich: Springer, 2018: 690-706.
[17]	SHI Jing, ZHONG Yiwu, XU Ning, et al. A simple baseline for weakly-supervised scene graph generation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 16393-16402.
[18]	MALAWADE A V, YU S Y, HSU B, et al. roadscene2vec: a tool for extracting and embedding road scene-graphs[J]. Knowledge-Based Systems, 2022, 242:108245.
[19]	CHENG Bowen, COLLINS M D, ZHU Yukun, et al. Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020.
[20]	陈军, 刘万增, 武昊, 等. 智能化测绘的基本问题与发展方向[J]. 测绘学报, 2021, 50(8):995-1005. DOI: 10.11947/j.AGCS.2021.20210235.
	CHEN Jun, LIU Wanzeng, WU Hao, et al. Smart surveying and mapping:fundamental issues and research agenda[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(8):995-1005. DOI: 10.11947/j.AGCS.2021.20210235.
[21]	JUN C, ZHILIN L I, SONGNIAN L I, et al. From digitalized to intelligentized surveying and mapping: fundamental issues and research agenda[J]. Journal of Geodesy and Geoinformation Science, 2022, 5(2):148-160.
[22]	任加新, 刘万增, 陈军, 等. 知识引导的碎片化栅格地形图比例尺智能识别[J]. 测绘学报, 2024, 53(1):146-157. DOI: 10.11947/j.AGCS.2024.20230005.
	REN Jiaxin, LIU Wanzeng, CHEN Jun, et al. Knowledge-guided intelligent recognition of the scale for fragmented raster topographic maps[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(1):146-157. DOI: 10.11947/j.AGCS.2024.20230005.
[23]	张帆, 刘瑜. 街景影像：基于人工智能的方法与应用[J]. 遥感学报, 2021, 25(5):1043-1054.
	ZHANG Fan, LIU Yu. Street view imagery: methods and applications based on artificial intelligence[J]. National Remote Sensing Bulletin, 2021, 25(5):1043-1054.
[24]	HOU Rui, LI Jie, BHARGAVA A, et al. Real-time panoptic segmentation from dense detections[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 8520-8529.
[25]	LÜ Zhengyao, LI Xiaoming, LI Xin, et al. Learning semantic person image generation by region-adaptive normalization[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10801-10810.
[26]	SONG Sijie, ZHANG Wei, LIU Jiaying, et al. Unsupervised person image generation with semantic parsing transformation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2352-2361.
[27]	MA Wenguang, MA Wei, XU Shibiao, et al. Pyramid ALKNet for semantic parsing of building facade image[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(6):1009-1013.
[28]	徐鹏斌, 瞿安国, 王坤峰, 等. 全景分割研究综述[J]. 自动化学报, 2021, 47(3):549-568.
	XU Pengbin, JU Anguo, WANG Kunfeng, et al. A survey of panoptic segmentation methods[J]. Acta Automatica Sinica, 2021, 47(3):549-568.
[29]	WOO S, PARK J, LEE J, et al. CBAM: convolutional block attention module[C]//Proceedings of 2018 European conference on computer vision. Munich: Springer, 2018.
[30]	LI Z L, ZHAO R L, CHEN J. A Voronoi-based spatial algebra for spatial relations[J]. Progress in Natural Science-Materials International, 2002, 12(7):528-536.
[31]	魏海涛, 李柯, 赫晓慧, 等. 融入空间关系的矩阵分解POI推荐模型[J]. 武汉大学学报(信息科学版), 2021, 46(5):681-690.
	WEI Haitao, LI Ke, HE Xiaohui, et al. Integrating spatial relationship into a matrix factorization model for POI recommendation[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5):681-690.
[32]	陈杰, 戴欣宜, 周兴, 等. 双LSTM驱动的高分遥感影像地物目标空间关系语义描述[J]. 遥感学报, 2021, 25(5):1085-1094.
	CHEN Jie, DAI Xinyi, ZHOU Xing, et al. Semantic understanding of geo-objects’relationship in high resolution remote sensing image driven by dual LSTM[J]. National Remote Sensing Bulletin, 2021, 25(5):1085-1094.
[33]	GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 3828-3838.
[34]	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11):1231-1237.
[35]	FAROOQ BHAT S, ALHASHIM I, WONKA P. AdaBins: depth estimation using adaptive bins[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4008-4017.
[36]	徐守坤, 吉晨晨, 倪楚涵, 等. 融合施工场景及空间关系的图像描述生成模型[J]. 计算机工程, 2020, 46(6):256-265.
	XU Shoukun, JI Chenchen, NI Chuhan, et al. Image description generation model integrating construction scenes and spatial relationship[J]. Computer Engineering, 2020, 46(6):256-265.
[37]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
[38]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[39]	CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1251-1258.
[40]	SUN Ke, XIAO Bin, LIU Dong, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696.
[41]	MOHAN R, VALADA A. Amodal panoptic segmentation[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20991-21000.
[42]	DAI Xiyang, CHEN Yinpeng, XIAO Bin, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7369-7378.
[43]	张继贤, 刘飞. 视觉SLAM环境感知技术现状与智能化测绘应用展望[J]. 测绘学报, 2023, 52(10):1617-1630. DOI: 10.11947/j.AGCS.2023.20220240.
	ZHANG Jixian, LIU Fei. Review of visual SLAM environment perception technology and intelligent surveying and mapping application[J]. Acta Geodaetica et Cartographica Sinica, 2023, 52(10):1617-1630. DOI: 10.11947/j.AGCS.2023.20220240.

编辑推荐 0

Metrics

阅读次数

全文

278

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	42	0	0	236

来源	本网站	其他网站

次数	277	1
比例	100%	0%

摘要

255

最新录用	在线预览	正式出版

0	0	255

	来源	本网站

	次数	255
	比例	100%

知识编号	知识描述	启发
知识1	建筑物、植被等目标较大的面目标，只有足够长度的目标，才会导致同一目标分割成多个连通域	需重点关注标志杆等具有足够长度的显著目标
知识2	人行道、绿化带等线状目标紧邻道路，容易被道路上的车辆及行人遮挡	人的目标相对较小，需重点关注汽车
知识3	汽车为动态目标，与其他目标的空间关系是可变的；而标志杆为静态目标，与其他目标的空间关系是固定的	不同类型的目标需要用不同的方法进行优化，如标志杆选择影像中最显著的实例即可，而汽车由于位置不固定，则需要综合考虑所有汽车实例的影响
知识4	以影像拍摄地点为起点，距离越远越容易发生遮挡	需更加关注远处的目标
知识5	足够长或足够宽的后景目标会因为遮挡而被分割成多个连通域	较小的目标往往被完全遮挡，无法在影像上体现

影像实体	知识描述	拓扑关系	衍生关系	语义描述
前后遮挡的车辆	不应相接，若相接则发生交通事故	相接	前后相离	前后行驶的两辆汽车
天空和地面汽车	天空在上，汽车在下，不可能相接	相接	上下相离	天空下有一辆汽车
交通信号设施与人行道	交通信号设施立于人行道边上	相接	上下叠置	交通信号设施立于人行道边上
人与自行车	自行车不能脱离人的控制，人骑在自行车上	相接	上下叠置	人骑着自行车在路上行驶

输入：标注目标a，标注目标b
输出：Left, Right, Front, Behind, Attched
function Rule1(a,b)
begin
	for i:=0 down to m
	begin
		if x₁>x₂ then
		begin
			Left(a,b)=True
		end;
		else if x₁<x₂ then
		begin
			Right(a,b)=True
		end;
		else if d₁<d₂ then
		begin
			Front(a,b)=True
		end
		else if d₁>d₂ then
		begin
			Behind(a,b)=True
		end;
		else if x₁=x₂ and d₁=d₂ then
		begin
			Attched(a,b)=True
		end;
	end;
end;
return Left, Right, Front, Behind, Attched

方法	Backbone	PQ/(%)	SQ/(%)	RQ/(%)
AUNet	ResNet101	59.0	—	—
PanopticFPN	ResNet101	58.1	—	—
UPSNet	ResNet50	59.3	79.7	73.0
Panoptic－Deeplab	HRNet48	60.4	80.7	73.6
K－CAPSNet	HRNet48	61.8	81.6	75.4

Group	Backbone	CBAM	PQ/(%)	SQ/(%)	RQ/(%)
试验1	ResNet50		57.6	80.1	70.6
试验2	ResNet50	√	60.0	80.7	73.2
试验3	Xception65		59.2	80.5	72.9
试验4	Xception65	√	61.2	80.9	74.5
试验5	HRNet48		60.4	80.7	73.6
试验6	HRNet48	√	61.8	81.6	75.4

基于混合智能的街景影像知识提取方法

Research on knowledge extraction from street scene images based on hybrid intelligence

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 43

相关文章 15

编辑推荐 0

Metrics

本文评价

[1]	徐涛, 杨元维, 高贤君, 王志威, 潘越, 李少华, 许磊, 王艳军, 刘波, 余静, 吴凤敏, 孙浩宇. 融合图卷积与多尺度特征的接触网点云语义分割[J]. 测绘学报, 2024, 53(8): 1624-1633.
[2]	陈军, 艾廷华, 闫利, 刘万增, 李志林, 朱强, 高井祥, 谢洪, 武昊, 张俊. 智能化测绘的混合计算范式与方法研究[J]. 测绘学报, 2024, 53(6): 985-998.
[3]	蒋亚楠, 郑林枫, 许强, 汤明高, 朱星. 机理引导下的阶跃型滑坡位移预测深度学习模型[J]. 测绘学报, 2024, 53(6): 1128-1139.
[4]	彭代锋, 翟晨晨, 周顶蔚, 张永军, 管海燕, 臧玉府. 基于金字塔语义token全局信息增强的高分光学遥感影像变化检测[J]. 测绘学报, 2024, 53(6): 1195-1211.
[5]	丁少鹏, 卢秀山, 刘如飞, 杨懿, 顾海燕, 李海涛. 联合目标特征引导与多重注意力的建筑物变化检测[J]. 测绘学报, 2024, 53(6): 1224-1235.
[6]	纪长琦, 郭肇捷, 孙海丽, 钟若飞. 基于移动激光扫描的地铁隧道渗漏水定位及快速检测方法[J]. 测绘学报, 2024, 53(6): 1236-1250.
[7]	王彦坤, 樊红, 樊勇, 李晓明, 王伟玺, 郭仁忠. 一种“附近”空间关系增强的多源融合室内定位方法[J]. 测绘学报, 2024, 53(1): 118-125.
[8]	任加新, 刘万增, 陈军, 张蓝, 陶远, 朱秀丽, 赵婷婷, 李然, 翟曦, 王海清, 周晓光, 侯东阳, 王勇. 知识引导的碎片化栅格地形图比例尺智能识别[J]. 测绘学报, 2024, 53(1): 146-157.
[9]	江宝得, 黄威, 许少芬, 巫勇. 融合分散自适应注意力机制的多尺度遥感影像建筑物实例细化提取[J]. 测绘学报, 2023, 52(9): 1504-1514.
[10]	吕可枫, 张永生, 于英, 闵杰. 语义信息与地理配准相结合的实例目标定位[J]. 测绘学报, 2023, 52(8): 1375-1386.
[11]	蒋萌, 杨春成, 尚海滨, 秦志龙, 王泽凡. 地理实体与重叠空间关系联合抽取的改进CasRel模型法[J]. 测绘学报, 2023, 52(8): 1387-1397.
[12]	张艺超, 郑向涛, 卢孝强. 基于层级Transformer的高光谱图像分类方法[J]. 测绘学报, 2023, 52(7): 1139-1147.
[13]	胡功明, 杨春成, 徐立, 尚海滨, 王泽凡, 秦志龙. 改进U-Net的遥感图像语义分割方法[J]. 测绘学报, 2023, 52(6): 980-989.
[14]	顾小虎, 李正军, 缪健豪, 李星华, 沈焕锋. 高分遥感影像双通道并行混合卷积分类方法[J]. 测绘学报, 2023, 52(5): 798-807.
[15]	胡明洪, 李佳田, 姚彦吉, 阿晓荟, 陆美, 李文. 结合多路径的高分辨率遥感影像建筑物提取SER-UNet算法[J]. 测绘学报, 2023, 52(5): 808-817.