Machine Vision Special Issue: Building Match Graph Using Deep Convolution Feature for Structure from Motion

doi:10.11947/j.AGCS.2018.20180040

Abstract

Abstract: Image matching in an unordered image dataset is quite time-consuming for structure from motion (SfM) due to image matching by comparing features and large number of matches between all image pairs. To reduce matching times, deep convolution feature (DCF) is proposed to create image match graph in this paper. Firstly, the convolutional feature map of an image is extracted using the VGG-16 convolutional neural network trained on ImageNet. Then, the sum pooling is used to process the feature map. Finally, the vector is normalized and used to represent the image. The similarities between an image and all other images is calculated by calculating the distances between these feature vectors. Thus, the match graph is constructed by selecting the top 10 images with highest similarities. The experiment results showed that the proposed DCF can create the match graph effectively, find the potential image pairs. On the Urban and South Building datasets, the results of the SfM reconstruction based on the match graph created by the proposed DCF are almost the same as those of the exhaustive matching, but the number of matches are reduced by 97.4% and 92.1%, respectively. At the same time, the match graph created by the proposed DCF is obviously better than the match graph crated by the DBoW3 in the most advanced SLAM system.

Key words: deep convolution feature, match graph, structure from motion, transfer learning

CLC Number:

P237

WAN Jie, Alper YILMAZ. Machine Vision Special Issue: Building Match Graph Using Deep Convolution Feature for Structure from Motion[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(6): 882-891.

References

[1] LOWE D G. Object Recognition from Local Scale-invariant Features[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Kerkyra, Greece:IEEE, 1999.
[2] BAY H, ESS A, TUYTELAARS T, et al, Speeded-up Robust Features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3):346-359.
[3] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB:An Efficient Alternative to SIFT or SURF[C]//IEEE International Conference on Computer Vision. Barcelona, Spain:IEEE, 2011.
[4] LEUTENEGGER S, CHLI M, SIEGWART R Y. BRISK:Binary Robust Invariant Scalable Keypoints[C]//IEEE International Conference on Computer Vision. Barcelona, Spain:IEEE, 2011.
[5] CRANDALL D, OWENS A, SNAVELY N, et al. Discrete-continuous Optimization for Large-scale Structure from Motion[C]//IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI:IEEE, 2011.
[6] WU Changchang. SiftGPU:A GPU Implementation of Scale Invariant Feature Transform (SIFT).(2007). URL http://cs.unc.edu/~ccwu/siftgpu.
[7] SNAVELY N, SEITZ S M, SZELISKI R. Modeling the World from Internet Photo Collections[J]. International Journal of Computer Vision, 2008, 80(2):189-210.
[8] Wu Changchang. VisualSFM:A Visual Structure from Motion System[EB/OL].[2017-12-12]. http://www.cs.washington.edu/homes/ccwu/vsfm.
[9] FUHRMANN S, LANGGUTH F, MOEHRLE N, et al. MVE:An Image-based Reconstruction Environment[J]. Computers & Graphics, 2015, 53:44-53.
[10] LECUN Y, BENGIO Y, HINTON G. Deep Learning[J]. Nature, 2015, 521(7553):436-444.
[11] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[12] BENGIO Y. Deep Learning of Representations for Unsupervised and Transfer Learning[C]//Proceedings of 2011 International Conference on Unsupervised and Transfer Learning workshop. Washington, USA:JMLR, 2012.
[13] AGARWAL S, FURUKAWA Y, SNAVELY N, et al. Building Rome in a Day[J]. Communications of the ACM, 2011, 54(10):105-112.
[14] HAVLENA M, SCHINDLER K. VocMatch:Efficient Multiview Correspondence for Structure from Motion[M]//FLEET D, PAJDLA T, SCHIELE B, et al. Computer Vision-ECCV 2014. Cham:Springer, 2014.
[15] ZHAN Zongqian, WANG Xin, WEI Minglu. Fast Method of Constructing Image Correlations to Build a Free Network Based on Image Multivocabulary Trees[J]. Journal of Electronic Imaging, 2015, 24(3):033029.
[16] SCHÖNBERGER J L, FRAHM J M. Structure-from-Motion Revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV:IEEE, 2016.
[17] SCHÖNBERGER J L, PRICE T, SATTLER T, et al. A Vote-and-verify Strategy for Fast Spatial Verification in Image Retrieval[M]//LAI S H, LEPETIT V, NISHINO K, et al. Computer Vision-ACCV 2016. Cham:Springer, 2016.
[18] ANGELI A, FILLIAT D, DONCIEUX S, et al. Fast and Incremental Method for Loop-closure Detection Using Bags of Visual Words[J]. IEEE Transactions on Robotics, 2008, 24(5):1027-1037.
[19] CUMMINS M,NEWMAN P. Appearance-only SLAM at Large Scale with FAB-MAP 2.0[J]. The International Journal of Robotics Research, 2011, 30(9):1100-1123.
[20] GALVEZ-LÓPE, D, TARDOS J D. Bags of Binary Words for Fast Place Recognition in Image Sequences[J]. IEEE Transactions on Robotics, 2012, 28(5):1188-1197.
[21] MUR-ARTAL R, TARDÓS J D. ORB-SLAM2:An Open-source SLAM System for Monocular, Stereo, and RGB-D Cameras[J]. IEEE Transactions on Robotics, 2017, 33(5):1255-1262.
[22] GAO Xiang, ZHANG Tao. Unsupervised Learning to Detect Loops Using Deep Neural Networks for Visual Slam System[J]. Autonomous Robots, 2017, 41(1):1-18.
[23] ZHANG Xiwu, SU Yan, ZHU Xinhua. Loop Closure Detection for Visual SLAM Systems Using Convolutional Neural Network[C]//The 23rd International Conference on Automation and Computing. Huddersfield, UK:IEEE, 2017.
[24] BABENKO A, SLESAREV A, CHIGORIN A, et al. Neural Codes for Image Retrieval[M]//FLEET D, PAJDLA T, SCHIELE B, et al. Computer Vision-ECCV 2014. Cham:Springer, 2014.
[25] RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN Features Off-the-shelf:An Astounding Baseline for Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, OH:IEEE, 2014.
[26] YANDE A B, LEMPITSKY V. Aggregating Local Deep Features for Image Retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile:IEEE, 2015.
[27] DESEILLIGNY M P, CLÉRY I. Apero, An Open Source Bundle Adjusment Software for Automatic Calibration and Orientation of Set of Images[C]//Proceedings of the ISPRS Symposium.[S.l.]:ISPRS, 2011:269-276.