测绘学报 ›› 2024, Vol. 53 ›› Issue (9): 1817-1828.doi: 10.11947/j.AGCS.2024.20220720
• 地图学与地理信息 • 上一篇
刘万增1,2,3(), 陈杭2,4, 任加新2,5(), 张兆江4, 李然1,2,3, 赵婷婷1,2,3, 翟曦1,2,3, 朱秀丽1,2,3
收稿日期:
2022-12-31
发布日期:
2024-10-16
通讯作者:
任加新
E-mail:luwnzg@163.com;jaycecd@foxmail.com
作者简介:
刘万增(1970—),男,博士,教授级高级工程师,研究方向为时空知识服务。E-mail:luwnzg@163.com
基金资助:
Wanzeng LIU1,2,3(), Hang CHEN2,4, Jiaxin REN2,5(), Zhaojiang ZHANG4, Ran LI1,2,3, Tingting ZHAO1,2,3, Xi ZHAI1,2,3, Xiuli ZHU1,2,3
Received:
2022-12-31
Published:
2024-10-16
Contact:
Jiaxin REN
E-mail:luwnzg@163.com;jaycecd@foxmail.com
About author:
LIU Wanzeng (1970—), male, PhD, professorate senior engineer, majors in spatio-temporal knowledge service. E-mail: luwnzg@163.com
Supported by:
摘要:
针对街景影像目标的智能化提取难题,本文提出了一种基于混合智能的街景影像知识提取方法(K-CAPSNet)。首先,在现有全景分割网络的基础上,同时关注街景影像的通道信息和空间信息,发展了一种联合注意力机制的全景分割网络,以提高目标分割精度;其次,将人们在生产、生活中形成的街景知识融入街景影像认知过程,借助先验知识设置目标标记阈值,对分割结果进行优化;然后,进一步根据街景影像先验知识验证街景目标之间的拓扑关系并利用深度信息进行空间关系知识挖掘;最后,采用语义模板对街景目标类型、数量及空间关系进行描述和表达。试验表明,相较于基线网络,本文方法在全景分割质量和识别质量方面都有明显提升,较好地实现了对街景影像知识的提取与表达。
中图分类号:
刘万增, 陈杭, 任加新, 张兆江, 李然, 赵婷婷, 翟曦, 朱秀丽. 基于混合智能的街景影像知识提取方法[J]. 测绘学报, 2024, 53(9): 1817-1828.
Wanzeng LIU, Hang CHEN, Jiaxin REN, Zhaojiang ZHANG, Ran LI, Tingting ZHAO, Xi ZHAI, Xiuli ZHU. Research on knowledge extraction from street scene images based on hybrid intelligence[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(9): 1817-1828.
表1
街景先验知识优化由遮挡引起的后景实例过度分割"
知识编号 | 知识描述 | 启发 |
---|---|---|
知识1 | 建筑物、植被等目标较大的面目标,只有足够长度的目标,才会导致同一目标分割成多个连通域 | 需重点关注标志杆等具有足够长度的显著目标 |
知识2 | 人行道、绿化带等线状目标紧邻道路,容易被道路上的车辆及行人遮挡 | 人的目标相对较小,需重点关注汽车 |
知识3 | 汽车为动态目标,与其他目标的空间关系是可变的;而标志杆为静态目标,与其他目标的空间关系是固定的 | 不同类型的目标需要用不同的方法进行优化,如标志杆选择影像中最显著的实例即可,而汽车由于位置不固定,则需要综合考虑所有汽车实例的影响 |
知识4 | 以影像拍摄地点为起点,距离越远越容易发生遮挡 | 需更加关注远处的目标 |
知识5 | 足够长或足够宽的后景目标会因为遮挡而被分割成多个连通域 | 较小的目标往往被完全遮挡,无法在影像上体现 |
"
输入:标注目标a,标注目标b | |||
输出:Left, Right, Front, Behind, Attched | |||
function Rule1(a,b) | |||
begin | |||
for i:=0 down to m | |||
begin | |||
if x1>x2 then | |||
begin | |||
Left(a,b)=True | |||
end; | |||
else if x1<x2 then | |||
begin | |||
Right(a,b)=True | |||
end; | |||
else if d1<d2 then | |||
begin | |||
Front(a,b)=True | |||
end | |||
else if d1>d2 then | |||
begin | |||
Behind(a,b)=True | |||
end; | |||
else if x1=x2 and d1=d2 then | |||
begin | |||
Attched(a,b)=True | |||
end; | |||
end; | |||
end; | |||
return Left, Right, Front, Behind, Attched |
[1] | 刘万增, 陈军, 翟曦, 等. 时空知识中心的研究进展与应用[J]. 测绘学报, 2021, 50(9):1183-1193. DOI: 10.11947/j.AGCS.2021.20210160. |
LIU Wanzeng, CHEN Jun, ZHAI Xi, et al. Research progress and application of spatiotemporal knowledge center[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(9):1183-1193. DOI: 10.11947/j.AGCS.2021.20210160. | |
[2] | YING A O, PENGLONG L I, LI W, et al. Fully convolutional networks for street furniture identification in panorama images[J]. Journal of Geodesy and Geoinformation Science, 2022, 5(4):59-71. |
[3] | GUSTAFSSON F K, DANELLJAN M, SCHON T B. Evaluating scalable Bayesian deep learning methods for robust computer vision[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 1289-1298. |
[4] | ZUO Z, ZHANG W, ZHANG D. A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields[J]. Journal of Geodesy and Geoinformation Science, 2020, 3(3):39-49. |
[5] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788. |
[6] | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525. |
[7] | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-09-20]. https://arxiv.org/abs/1804.02767v1. |
[8] | LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944. |
[9] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-08-17]. https://arxiv.org/abs/2004.10934v1. |
[10] | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440. |
[11] | HE Kaiming, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969. |
[12] | KIRILLOV A, HE Kaiming, GIRSHICK R, et al. Panoptic segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9404-9413. |
[13] | JOHNSON J, KRISHNA R, STARK M, et al. Image retrieval using scene graphs[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3668-3678. |
[14] | CHANG X, REN P, XU P, et al. Scene graphs: a survey of generations and applications[EB/OL]. [2024-03-17]http://arxiv.org/abs/2104.01111v1. |
[15] | TENG Yao, WANG Limin. Structured sparse R-CNN for direct scene graph generation[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 19415-19424. |
[16] | YANG J, LU J, LEE S, et al. Graph R-CNN for scene graph generation[C]//Proceedings of 2018 European conference on computer vision (ECCV). Munich: Springer, 2018: 690-706. |
[17] | SHI Jing, ZHONG Yiwu, XU Ning, et al. A simple baseline for weakly-supervised scene graph generation[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 16393-16402. |
[18] | MALAWADE A V, YU S Y, HSU B, et al. roadscene2vec: a tool for extracting and embedding road scene-graphs[J]. Knowledge-Based Systems, 2022, 242:108245. |
[19] | CHENG Bowen, COLLINS M D, ZHU Yukun, et al. Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. |
[20] | 陈军, 刘万增, 武昊, 等. 智能化测绘的基本问题与发展方向[J]. 测绘学报, 2021, 50(8):995-1005. DOI: 10.11947/j.AGCS.2021.20210235. |
CHEN Jun, LIU Wanzeng, WU Hao, et al. Smart surveying and mapping:fundamental issues and research agenda[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(8):995-1005. DOI: 10.11947/j.AGCS.2021.20210235. | |
[21] | JUN C, ZHILIN L I, SONGNIAN L I, et al. From digitalized to intelligentized surveying and mapping: fundamental issues and research agenda[J]. Journal of Geodesy and Geoinformation Science, 2022, 5(2):148-160. |
[22] | 任加新, 刘万增, 陈军, 等. 知识引导的碎片化栅格地形图比例尺智能识别[J]. 测绘学报, 2024, 53(1):146-157. DOI: 10.11947/j.AGCS.2024.20230005. |
REN Jiaxin, LIU Wanzeng, CHEN Jun, et al. Knowledge-guided intelligent recognition of the scale for fragmented raster topographic maps[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(1):146-157. DOI: 10.11947/j.AGCS.2024.20230005. | |
[23] | 张帆, 刘瑜. 街景影像:基于人工智能的方法与应用[J]. 遥感学报, 2021, 25(5):1043-1054. |
ZHANG Fan, LIU Yu. Street view imagery: methods and applications based on artificial intelligence[J]. National Remote Sensing Bulletin, 2021, 25(5):1043-1054. | |
[24] | HOU Rui, LI Jie, BHARGAVA A, et al. Real-time panoptic segmentation from dense detections[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 8520-8529. |
[25] | LÜ Zhengyao, LI Xiaoming, LI Xin, et al. Learning semantic person image generation by region-adaptive normalization[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 10801-10810. |
[26] | SONG Sijie, ZHANG Wei, LIU Jiaying, et al. Unsupervised person image generation with semantic parsing transformation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2352-2361. |
[27] | MA Wenguang, MA Wei, XU Shibiao, et al. Pyramid ALKNet for semantic parsing of building facade image[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(6):1009-1013. |
[28] | 徐鹏斌, 瞿安国, 王坤峰, 等. 全景分割研究综述[J]. 自动化学报, 2021, 47(3):549-568. |
XU Pengbin, JU Anguo, WANG Kunfeng, et al. A survey of panoptic segmentation methods[J]. Acta Automatica Sinica, 2021, 47(3):549-568. | |
[29] | WOO S, PARK J, LEE J, et al. CBAM: convolutional block attention module[C]//Proceedings of 2018 European conference on computer vision. Munich: Springer, 2018. |
[30] | LI Z L, ZHAO R L, CHEN J. A Voronoi-based spatial algebra for spatial relations[J]. Progress in Natural Science-Materials International, 2002, 12(7):528-536. |
[31] | 魏海涛, 李柯, 赫晓慧, 等. 融入空间关系的矩阵分解POI推荐模型[J]. 武汉大学学报(信息科学版), 2021, 46(5):681-690. |
WEI Haitao, LI Ke, HE Xiaohui, et al. Integrating spatial relationship into a matrix factorization model for POI recommendation[J]. Geomatics and Information Science of Wuhan University, 2021, 46(5):681-690. | |
[32] | 陈杰, 戴欣宜, 周兴, 等. 双LSTM驱动的高分遥感影像地物目标空间关系语义描述[J]. 遥感学报, 2021, 25(5):1085-1094. |
CHEN Jie, DAI Xinyi, ZHOU Xing, et al. Semantic understanding of geo-objects’relationship in high resolution remote sensing image driven by dual LSTM[J]. National Remote Sensing Bulletin, 2021, 25(5):1085-1094. | |
[33] | GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 3828-3838. |
[34] | GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11):1231-1237. |
[35] | FAROOQ BHAT S, ALHASHIM I, WONKA P. AdaBins: depth estimation using adaptive bins[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4008-4017. |
[36] | 徐守坤, 吉晨晨, 倪楚涵, 等. 融合施工场景及空间关系的图像描述生成模型[J]. 计算机工程, 2020, 46(6):256-265. |
XU Shoukun, JI Chenchen, NI Chuhan, et al. Image description generation model integrating construction scenes and spatial relationship[J]. Computer Engineering, 2020, 46(6):256-265. | |
[37] | CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223. |
[38] | HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. |
[39] | CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1251-1258. |
[40] | SUN Ke, XIAO Bin, LIU Dong, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. |
[41] | MOHAN R, VALADA A. Amodal panoptic segmentation[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20991-21000. |
[42] | DAI Xiyang, CHEN Yinpeng, XIAO Bin, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7369-7378. |
[43] | 张继贤, 刘飞. 视觉SLAM环境感知技术现状与智能化测绘应用展望[J]. 测绘学报, 2023, 52(10):1617-1630. DOI: 10.11947/j.AGCS.2023.20220240. |
ZHANG Jixian, LIU Fei. Review of visual SLAM environment perception technology and intelligent surveying and mapping application[J]. Acta Geodaetica et Cartographica Sinica, 2023, 52(10):1617-1630. DOI: 10.11947/j.AGCS.2023.20220240. |
[1] | 徐涛, 杨元维, 高贤君, 王志威, 潘越, 李少华, 许磊, 王艳军, 刘波, 余静, 吴凤敏, 孙浩宇. 融合图卷积与多尺度特征的接触网点云语义分割[J]. 测绘学报, 2024, 53(8): 1624-1633. |
[2] | 陈军, 艾廷华, 闫利, 刘万增, 李志林, 朱强, 高井祥, 谢洪, 武昊, 张俊. 智能化测绘的混合计算范式与方法研究[J]. 测绘学报, 2024, 53(6): 985-998. |
[3] | 蒋亚楠, 郑林枫, 许强, 汤明高, 朱星. 机理引导下的阶跃型滑坡位移预测深度学习模型[J]. 测绘学报, 2024, 53(6): 1128-1139. |
[4] | 彭代锋, 翟晨晨, 周顶蔚, 张永军, 管海燕, 臧玉府. 基于金字塔语义token全局信息增强的高分光学遥感影像变化检测[J]. 测绘学报, 2024, 53(6): 1195-1211. |
[5] | 丁少鹏, 卢秀山, 刘如飞, 杨懿, 顾海燕, 李海涛. 联合目标特征引导与多重注意力的建筑物变化检测[J]. 测绘学报, 2024, 53(6): 1224-1235. |
[6] | 纪长琦, 郭肇捷, 孙海丽, 钟若飞. 基于移动激光扫描的地铁隧道渗漏水定位及快速检测方法[J]. 测绘学报, 2024, 53(6): 1236-1250. |
[7] | 王彦坤, 樊红, 樊勇, 李晓明, 王伟玺, 郭仁忠. 一种“附近”空间关系增强的多源融合室内定位方法[J]. 测绘学报, 2024, 53(1): 118-125. |
[8] | 任加新, 刘万增, 陈军, 张蓝, 陶远, 朱秀丽, 赵婷婷, 李然, 翟曦, 王海清, 周晓光, 侯东阳, 王勇. 知识引导的碎片化栅格地形图比例尺智能识别[J]. 测绘学报, 2024, 53(1): 146-157. |
[9] | 江宝得, 黄威, 许少芬, 巫勇. 融合分散自适应注意力机制的多尺度遥感影像建筑物实例细化提取[J]. 测绘学报, 2023, 52(9): 1504-1514. |
[10] | 吕可枫, 张永生, 于英, 闵杰. 语义信息与地理配准相结合的实例目标定位[J]. 测绘学报, 2023, 52(8): 1375-1386. |
[11] | 蒋萌, 杨春成, 尚海滨, 秦志龙, 王泽凡. 地理实体与重叠空间关系联合抽取的改进CasRel模型法[J]. 测绘学报, 2023, 52(8): 1387-1397. |
[12] | 张艺超, 郑向涛, 卢孝强. 基于层级Transformer的高光谱图像分类方法[J]. 测绘学报, 2023, 52(7): 1139-1147. |
[13] | 胡功明, 杨春成, 徐立, 尚海滨, 王泽凡, 秦志龙. 改进U-Net的遥感图像语义分割方法[J]. 测绘学报, 2023, 52(6): 980-989. |
[14] | 顾小虎, 李正军, 缪健豪, 李星华, 沈焕锋. 高分遥感影像双通道并行混合卷积分类方法[J]. 测绘学报, 2023, 52(5): 798-807. |
[15] | 胡明洪, 李佳田, 姚彦吉, 阿晓荟, 陆美, 李文. 结合多路径的高分辨率遥感影像建筑物提取SER-UNet算法[J]. 测绘学报, 2023, 52(5): 808-817. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||