Big data and big data analytics are used to describe data sets and analytical techniques in applications that are so large and complex and require special technologies to analyze and visualize them.
Since 1960 and beyond the need for an efficient data management and retrieval of data has always been an issue due to the growing need in business and academia. To resolve these issues a number of databases models have been created. Relational databases allow data storage, retrieval and manipulation using a standard Structured Query Language (SQL). Until now, relational databases were an optimal enterprise storage choice. However, with an increase in growth of stored and analyzed data, relational databases have displayed a variety of limitations. The limitations of scalability, storage and efficiency of queries due to the large volumes of data [1] [2].
The invention of relational databases have brought a number of changes to the business world in which they operate specially for the businesses whose prime focus is on its customers, their likes and dislikes to win more market share. There is no such concept as “one size fits all” in using this technology, it varies from industry to industry. One thing may work for some businesses and may not work for others, therefore it is advisable that one should shop around before investing in any of the technologies because it is vital to find an industry-specific solution. One technique to narrow the search for industry-specific solutions is to find out what our competitors are using to gain more customer base.
Big Data is becoming more meaningful with the ever more powerful data technologies, which enable us to derive insights from the data and help us make decisions. Big Data also creates new courses and professional fields such as the data science and data scientist, which are aimed at analyzing the ever growing volume of data. Some might think this exaggerated because data analysis, after all, not a new invention. However, we might all agree that the progress of digitization associated with the generation of ever larger amounts of data have totally changed the ways we deal with data.
Abstract - Considered as another subversive technology revolution in IT other than Internet of Things and cloud computing, Big Data has the most valuable property. Its value hides in the great storage that needs to be analyzed while cloud computing merely serves as a method or a step to save and store the messages. As rapidly increasing attention is drawn on in recent years, Big Data has wider and wider influences in many fields. With the explosive demands on Big Data analytics, Big Data broadened its definition from an IT term of extremely large data sets to a set of new technologies.
Big data is an element that allows companies to leverage high volume data effectively and not in isolation. Big data needs to be quickly accessible and have the ability to be analyzed. Data stores or warehouses are one way data is managed that is persistent, protected and available as long as the data is needed. The forefather to data stores is relational data bases, relational data bases put in place decades ago are still in use today
According to a report from The International Business Machines Corporation, known as IBM, 90% of the data in the world has been generated in the last two years. Frank J. Ohlhorst (2013) explains how the concept of collecting data for use in business is not new, but the scale of data that has been collected recently is so large that it has been termed Big Data (p. 1). Company executives who choose to ignore Big Data are denying their companies an advantage over their competitors. Big Data analysis is fundamental for all fields of work; it provides an insight to large amounts of data that will answer questions and make discoveries to improve efficiency in all areas of the world.
“Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses.” (SAS, 1)
This increased the demand for data scientists across Europe. The larger the volumes of data the greater the complexity of data and analyzing of data. Getting high varieties and volumes of data at greater velocities is the advantage of using Big data.(H. Davenport, 2014). The trend of making information from data has started from decision support followed by execution support, analytical processing, business Intelligence (BI) ended with Big Data. The design Big data architecture and
Emergence of big data generated by an increased number of data sources led the evolution of many data-handling tools. Storing and analyzing vast amounts of structured and unstructured data is a big challenge. Traditional relational databases such as Oracle, DB2, HANA, MySQL, and SQL Server still handle structured data for enterprise applications like ERP and CRM and financial systems. Most of these databases have added some level of in-memory features exception to SAP HANA, which runs the entire database in-memory so that users can gain insights into data faster.
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
This paper highlights five steps in analysis of big data and discusses what has already been done. This paper also list out the technical and management challenges in Big Data analysis. We begin by considering the five stages in the pipeline, then move on to the challenges, and end with a conclusion.
We also studied and compared new emerging NoSQL databases like Cassandra, Accumulo, CouchDB, Hbase, MongoDB etc. to find the best solution for organizations in accordance with their requirements.
The ever-widening realm of big data has created an expanding frontier of exploration for the creation of new methods of data analysis in order to produce actionable knowledge for the benefit of organizations everywhere. Companies amass enormous troves of data every day. Keeping this data housed in a fashion that maximizes storage efficiency and in a format optimized for query and analysis is paramount for effective data warehousing. Many database structures exist for the storage, arrangement, and accessing of data, but large databases and online analytical processing (OLAP) benefit from specific qualities. In these databases, compression and rapid querying are the main enabling qualities sought for analytical data stores and data warehouses. Columnar (or column-oriented) relational databases (RDBMS) offer these and other benefits, which is why it is a popular database scheme for analytical systems. Specifically, the vertical arrangement of records is optimal for selecting the sum, average, or a count of total record attributes because one horizontal read yields all values of an attribute. Otherwise, a physical disk must seek over and past unwanted attributes of the records to provide the same
Database research and associated standardization activities have successfully guided the development of database technology over the last four decades and SQL relational databases remain the dominant database technology today. This effort to innovate relational databases to address the needs of new applications is continuing today. Recent examples of database innovation include the development of streaming SQL technology that is 170 George Feuerlicht used to process rapidly flowing data (“data in flight”) minimizing latency in Web 2.0 applications, and database appliances that simplify DBMS deployment on cloud computing platforms. It is also evident from the above discussion that the relational