The paper 'Big Data Processing in Cloud Computing ' gives an overview of various challenges involved in management and analysis of large data sets and also presents a comprehensive list of cloud solutions for the same, Map reduce optimization strategies. This essay is being written based on the paper 'Big Data Processing in Cloud Computing ' written by Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, Keqiu Li and presented at the '2012 International Symposium on Pervasive Systems, Algorithms and Networks '. In the present era there has been an increase in the amount of data available from various sectors like social media, medical data, consumer usage data etc which can be used to infer useful results so as to improve …show more content…

Author says that conventional data storage systems (databases) work well with structured data, but crash under heavy workloads. He describes various distributed file systems like GFS (Google file system), HDFS (Hadoop distributed file system), and amazon S3(Simple Storage service). All these file systems handle unstructured data and support fault tolerance by data replication. Specially S3 provides good integration with other amazon services and provides big data processing capabilities to consumers at an affordable cost in a pas-as-you-go fashion. For storing non-structured and semi-structured data, the author provides solutions used in various corporates. He gives examples of BigTable used by Google and PNUTS used by Yahoo. One that caught my eye is the one proposed by Facebook, which is a hybrid data management system. It is hybrid in a sense that it combines features of a row-based and column-based database systems. Upon research I found that this new system actually enhances the performance of both query processing and load balancing [2]. The author then moves on to describe various available cloud vendors. All these Infrastructure as a service (IaaS) providers employ virtualization technologies to maximize

