The Application of Algebraic Topology ot Data Analysis
1440 Words6 Pages
The application of algebraic topology to data analysis is relatively new but promising. When confronted with large volumes of high dimensional data we would like to identify significant phenomena, and the persistence of topological features provides a new and potentially useful measure of significance. A key promise of this method is the ability to identify features without a model- truly unsupervised learning.
Unlike traditional statistical methods of data analysis which are primarily concerned parameter estimation, topological data analysis regards the data as a sample from a manifold embedded in euclidean space and attempts to recover topological features such as connectedness or the number of holes. An advantage of considering topology is that it is stable under deformations, and can therefore be said to be insensitive to errors introduced in the sampling .
To see the utility of the topological approach, and in particular the utility of homology to multidimensional data analysis, consider the following example, first describe by Scott (1992). Take random samples in R3 from two different populations. Suppose that these sample points form data clouds, similar to the shape of 2-dimensional spheres of with two different radii, with the smaller one being from one population and the larger one from the other population. Moreover, the smaller sphere sits inside the larger sphere.. Most multivariate and dimensionality reduction methods would not be able to detect the