Computational Advances Of Big Data

1147 Words5 Pages
In 2013 the overall created and copied data volume in the world was 4.4 ZB and it is doubling in size every two years and, by 2020 the digital universe – the data we create and copy annually – will reach 44 ZB, or 44 trillion gigabytes [1]. Under the massive increase of global digital data, Big Data term is mainly used to describe large-scale datasets. Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making [2]. Volume of Big Data represents the magnitude of data while variety refers to the heterogeneity of the data. Computational advances create a chance to use various types of structured, semi-structured, and…show more content…
Since big data includes large amount of inconsistent, incomplete, and noisy data, a number of data preprocessing techniques, including data cleaning, data integration, data transformation and data reduction, can be applied to remove noise and correct inconsistencies [5]. A good amount of feature selection algorithms of different models have been developed for multiple fields. Although existing statistical feature selection methods are useful for normal sized datasets, they may fall short in feture selection in Big Data due to noise, heterogeneity, and large volume. They become inefficient in extracting the complex and non-linear patterns generally observed in this kind of data. On the other hand, the hierarchial structure of Deep Learning techniques allow them to effectively select and extract meaningful features from Big Data. Some approaches have been tried for learning and extracting features from unlabeled image data, include Restricted Boltzmann Machines (RBMs) [6], autoencoders [7], and sparse coding [8] for different fields including image detection. But most of these techniques were only able to extract low-level features. Hence, to avoid pitfalls and overcome the challenges, developing and employing computationally efficient algorithms carries high importance. Furthermore, most of the proposed feature selection algorithms use batch learning which conducts
Open Document