Acta Geodaetica et Cartographica Sinica

    Next Articles

The Research on High Dimensional Data Clustering Based on Improved Feature Transformation

  

  • Received:2010-03-10 Revised:2010-09-27 Online:2011-06-25 Published:2011-06-25

Abstract:

The researches on similarity measure, feature transformation and the design of dimensionality reduction converter have been done in this paper, and the high dimensional data clustering algorithm is proposed. Firstly, gain the similarity matrix of high dimensional data with the similarity measure function HDsim designed in the paper, and translate it into distance matrix. Construct the graph of distance matrix through the nearest neighbor searching method and gain the distance matrix of the shortest path based on the algorithm Floyd. Then, translate the dimensionality reduction process into the optimization and design the fitness function, resolve this optimization problem with genetic algorithm. Finally, the reduced data is used for clustering analysis via k-means and the value pairs between the coordinates of high dimensional data and their reduced 2D coordinates are used for RBF neural network training, save the trained neural network. Determine the belongingness of new object based on the distance from the new object to each current clustering center through the trained neural network. It proves the validity of the high dimensional data clustering algorithm proposed in this paper through the clustering analysis of the data set iris and zoo in the machine learning database provided by UCI.

Key words: feature transformation, high dimensional data clustering, similarity measure, dimensionality reduction, genetic algorithm, RBF neural network