preview

Using Parallel Processing For Large Scale Database Analysis

Better Essays

Modern Data Centers always interested in the new technology for various web search analysis, web log, bigdata analysis, social networking, so in this tasks new technology implemented using parallel processing for large-scale database analysis, so the MapReduce is one of new technology to get amounts of data, perform massive computation, and extract critical knowledge out of big data for business intelligence, proper analysis of large scale of datasets, it requires accurate input output capacity from the large server systems to perform and analyze weblog data which is derived from two steps called mapping and reducing. Between these two steps, MapReduce requires a on important phase called shuffling phase which exchange the intermediate data. So at the point of data shuffling, by physically changing the location(moving) segments of intermediate data across disks, causes major I/O contention and generate the Input/Output problem such as large power consumption, high heat generation which accounts for a large portion of the operating cost of data centers in analyzing such big data. So in this synopsis we introduce the new virtual shuffling approach to enable well-organized data movement and reduce I/O problem for MapReduce shuffling, thereby reducing power consumption and conserving energy. Virtual shuffling is achieved through a combination of three techniques including a three-level segment table, near-demand merging, and dynamic and balanced merging subtrees. Our

Get Access