Data Mining Techniques And Models Have Phenomenal Triumph With Traditional Transactional Datasets

1548 Words7 Pages
Owing to rapid developments in digital technologies, the use of electronic media to capture, process and accumulate the information is witnessing an extraordinary development [1]. The stored information is reaching zeta-bytes [2], whereas our capability to scrutinize such large amount of data lags far behind the growth. One of the impediments is the high dimensionality of the datasets. This includes information in different application areas, such as in electronic health records (EHRs), biology, astronomy, medical imaging, video archiving, and web data. Different data mining techniques have been used to extract knowledge available in some of these data sets, albeit with limited success [3]. A number of data mining techniques and models…show more content…
For any given point in a high dimensional data space, the expected gap between Euclidean distance to the closest neighbor and to the farthest point in the space is minimized as the dimensionality grows [5]. Therefore, the notion of nearest neighbor is meaningless. This may lead many data mining tasks, like clustering, to become largely ineffective, as the model becomes more and more susceptible to noise present within the data. To handle the problem of high dimensionality of such data sets, a number of algorithms have been introduced, which use row-based enumeration techniques [6, 7] instead of using column-based enumeration algorithms such as [8-10]. These methods work based on the assumption that the data sets have thousands of columns or have a lot of dimensions, but smaller number of rows. Carpenter algorithm [6] uses a bottom-up method while the TD-close Algorithm [7] uses a top-down exploration approach. Both of these techniques however, work best for dense datasets. Also, these algorithms work only for high-dimensional datasets that have significantly lesser number of rows compared to number of columns. Different dimension reduction algorithms, such as Principal Component Analysis [11], Multi-Dimensional scaling [12] and Independent component analysis [13] are effective and popular algorithms for reducing dimensionality. However, the effectiveness of these different algorithms is limited due to their global linearity.
    Open Document