基于相似性保持和特征变换的高维数据聚类改进算法

测绘学报

• 学术论文 • 下一篇

基于相似性保持和特征变换的高维数据聚类改进算法

王家耀¹,谢明霞²,郭建忠,陈科⁴

收稿日期:2010-03-10 修回日期:2010-09-27 出版日期:2011-06-25 发布日期:2011-06-25
通讯作者: 谢明霞

The Research on High Dimensional Data Clustering Based on Improved Feature Transformation

Received:2010-03-10 Revised:2010-09-27 Online:2011-06-25 Published:2011-06-25

摘要/Abstract

摘要：

文章分别从相似性度量、特征变换以及降维转换器的生成三个方面进行研究，提出改进特征变换的高维数据聚类算法。首先，通过文中所设计的相似性度量函数 HDsim计算得到高维空间对象相似度矩阵，把相似度矩阵转化为距离矩阵，并利用近邻法创建距离矩阵相应的邻域图，根据最短路径算法Floyd，得到最短路径距离矩阵；其次，将高维空间中数据对象的二维映射过程（将高维数据转化为二维数据，使二维空间中各对象间欧氏距离趋近于高维空间对象间最短路径距离）转化为优化问题，并设计相应的适应度函数，利用遗传算法对其进行求解；最后，利用降维后的二维数据坐标点进行k-均值聚类，并根据（高维空间数据点坐标，降维后二维空间数据点坐标）值对进行RBF神经网络训练，保存训练好的神经网络，当一新数据对象输入时，利用训练好的神经网络对其进行二维映射，通过判断该对象与各聚类簇中心距离的远近获得其归属。通过对UCI提供的机器学习数据库中iris和zoo数据集的聚类分析，验证了文中所提出的高维数据聚类算法的有效性

关键词: 特征变换, 高维数据聚类, 相似度, 降维, 遗传算法, 径向基神经网络

Abstract:

The researches on similarity measure, feature transformation and the design of dimensionality reduction converter have been done in this paper, and the high dimensional data clustering algorithm is proposed. Firstly, gain the similarity matrix of high dimensional data with the similarity measure function HDsim designed in the paper, and translate it into distance matrix. Construct the graph of distance matrix through the nearest neighbor searching method and gain the distance matrix of the shortest path based on the algorithm Floyd. Then, translate the dimensionality reduction process into the optimization and design the fitness function, resolve this optimization problem with genetic algorithm. Finally, the reduced data is used for clustering analysis via k-means and the value pairs between the coordinates of high dimensional data and their reduced 2D coordinates are used for RBF neural network training, save the trained neural network. Determine the belongingness of new object based on the distance from the new object to each current clustering center through the trained neural network. It proves the validity of the high dimensional data clustering algorithm proposed in this paper through the clustering analysis of the data set iris and zoo in the machine learning database provided by UCI.

Key words: feature transformation, high dimensional data clustering, similarity measure, dimensionality reduction, genetic algorithm, RBF neural network

王家耀，谢明霞，郭建忠，陈科. 基于相似性保持和特征变换的高维数据聚类改进算法[J]. 测绘学报.

[1]	高晓蓉, 闫浩文, 禄小敏. 多尺度地图空间居民地语义相似度计算方法[J]. 测绘学报, 2022, 51(1): 95-103.
[2]	李培, 姜刚, 马千里, 薛万峰, 杨伟华. 结合张量与互信息的混合模型多模态图像配准方法[J]. 测绘学报, 2021, 50(7): 916-929.
[3]	纪雪, 唐秋华, 陈义兰, 李杰, 丁德秋. 联合支持向量机和增强学习算法的多波束声学底质分类[J]. 测绘学报, 2021, 50(7): 972-981.
[4]	刘远刚, 李少华, 蔡永香, 何贞铭, 马潇雅, 李鹏程, 郭庆胜, 何宗宜. 移位安全区约束下的建筑物群移位免疫遗传算法[J]. 测绘学报, 2021, 50(6): 812-822.
[5]	王华斌, 韩旻, 王光辉, 李玉. 遥感影像要素提取的可变结构卷积神经网络方法[J]. 测绘学报, 2019, 48(5): 583-596.
[6]	闫利, 谭骏祥, 刘华, 陈长军. 融合遗传算法和ICP的地面与车载激光点云配准[J]. 测绘学报, 2018, 47(4): 528-536.
[7]	张海波, 汪长城, 朱建军, 付海强. 利用ESAR极化数据的复杂地形区森林地上生物量估算[J]. 测绘学报, 2018, 47(10): 1353-1362.
[8]	唐炉亮, 牛乐, 杨雪, 张霞, 李清泉, 萧世伦. 利用轨迹大数据进行城市道路交叉口识别及结构提取[J]. 测绘学报, 2017, 46(6): 770-779.
[9]	成晓强, 杨敏, 桂志鹏, 艾廷华, 吴华意. 信息量与相似度约束下的网络地图服务缩略图自动生成算法[J]. 测绘学报, 2017, 46(11): 1891-1898.
[10]	安晓亚, 刘平芝, 金澄, 徐道柱, 王峰. 手绘地图开域空间方向关系检索法[J]. 测绘学报, 2017, 46(11): 1899-1909.
[11]	陈占龙, 龚希, 吴亮, 安晓亚. 顾及尺度差异的复合空间对象方向相似度定量计算模型[J]. 测绘学报, 2016, 45(3): 362-371.
[12]	唐炉亮, 杨雪, 牛乐, 常乐, 李清泉. 一种众源车载GPS轨迹大数据自适应滤选方法[J]. 测绘学报, 2016, 45(12): 1455-1463.
[13]	周鑫鑫, 吴长彬, 孙在宏, 丁远, 贺涛. 小规模地理场景中点要素三维注记优化配置算法[J]. 测绘学报, 2016, 45(12): 1476-1484.
[14]	陈占龙, 周林, 龚希, 吴亮. 基于方向关系矩阵的空间方向相似性定量计算方法[J]. 测绘学报, 2015, 44(7): 813-821.
[15]	赵姣姣, 曲江华, 袁洪. 针对北斗系统的降维快速高精度定向算法[J]. 测绘学报, 2015, 44(5): 488-494.

基于相似性保持和特征变换的高维数据聚类改进算法

The Research on High Dimensional Data Clustering Based on Improved Feature Transformation

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价