Abstract: This documents explores the concepts of column-oriented databases including its applications, advantages, and tools in which it is utilized. We describe how it differs from other database structures, and why it has become popular in the area of data analytics.
Introduction
The ever-widening realm of big data has created an expanding frontier of exploration for the creation of new methods of data analysis in order to produce actionable knowledge for the benefit of organizations everywhere. Companies amass enormous troves of data every day. Keeping this data housed in a fashion that maximizes storage efficiency and in a format optimized for query and analysis is paramount for effective data warehousing. Many database structures exist for the storage, arrangement, and accessing of data, but large databases and online analytical processing (OLAP) benefit from specific qualities. In these databases, compression and rapid querying are the main enabling qualities sought for analytical data stores and data warehouses. Columnar (or column-oriented) relational databases (RDBMS) offer these and other benefits, which is why it is a popular database scheme for analytical systems. Specifically, the vertical arrangement of records is optimal for selecting the sum, average, or a count of total record attributes because one horizontal read yields all values of an attribute. Otherwise, a physical disk must seek over and past unwanted attributes of the records to provide the same
Coronel, C. (2013). Database Systems: Design, Implementation, and Management, Tenth Edition. Mason, Ohio, United States: Cengage Learning.
Column family stores: Strength: great way to distribute data globally with high availability. It performs great with very large amounts of data distributed over many machines. Weakness: column oriented databases will be significantly slower when handling weather transactions.
For analysis, advanced statistical tools are used and the experimenter can draw necessary conclusions and inferences. In business sectors, they have a huge amount of scattered data in terms of profits, loss, demand, supply, sales and production. The industrial, insurance, agricultural, banking, information technology, food industry, telecommunication, retail, utilities, travel, pharmacy and many more have challenges to manage their data. As the coin always has two sides, there are both advantages and a few disadvantages of data analysis. The key advantages of data analysis are- The organizations can immediately come across errors, the service provided after optimizing the system using data analysis reduces the chances of failure, saves time and leads to advancement. It is also used to compare strategies between two companies so as to reduce the prices and gaining attention of target customers, ultimately leading to maximization of profit and minimization of cost(as done in Game Theory). However, big data analysis sometimes becomes more tedious and disadvantageous because it uses software Hadoop which requires special provisions in the computers. For now use of Hadoop for real-time analysis is not available. The manner in which the data is collected and the decision making view can vary from one person to another. Here, the quality of data gets affected and leaves the data insufficient or inefficient. In order to tackle this problem, the researcher must be professional, well experienced and should have deep knowledge about the characteristic under study. Also, we need to update data from time to time so as to avoid the changes in trend caused by the past data especially, for the rapidly growing
Database technologies are a core component of many computing systems. They allow data to be
I will be explaining all the features of a relational database such as entities, attributes, relationships and benefits and will be giving examples on each of these to show how they affect the database.
Introduction: A company called Ian’s & Co currently employs a team of IT technicians to manage their IT infrastructure and also support the IT users. Also quite recently the company has taken over a similar but a smaller company which is also employs technical support staff in the same way.
Some of the challenges faced by relational databases were the mismatch that resulted when transforming graphs into tables. On the other hand, when a database was needed only for simples tasks like logging, the relational database had too much more than what was required. Web applications have many different types of attributes which does not fit easily into a relational database, which makes it a burden to handle. For example, videos, text and source code are different types of attributes from the web, which have to be stored in various tables if relational databases are used, because of its strict schema. Qualities like these, make RDBMS, a not-so-wise choice to handle blogs and other web applications. The massive data that has to be taken care of in web applications complicates data handling for famous webpages like Amazon, Google and Facebook. Factors like trillions and trillions of read and write requests which needs to be responded with minimal or no latency, leads these organizations to maintain their own hardware in clusters of thousands. The “One solution for all” is
Volume is often regarded as the primary attribute of big data. With that in mind, a large number of people define big data in terabytes—sometimes petabytes, but big data can also be quantified by counting records, transactions, tables, or files (Russom, 2011). Volume refers to the mass quantities of data that organizations are trying to harness to improve decision-making across the enterprise (Schroeck et al., 2012). The volumes of data have continued to increase at an unprecedented rate over the last couple of years. The sheer volume of data that is stored or available for storage today is exploding, it is expected that by the year 2020 40 zetabytes (ZB) of data will be stored (Zikopoulos et al. 2012) which
To further understand data warehousing we can need to examine the range of factors that determine the success of data structure.
The database science grew in leaps and bounds, from the early file systems, to hierarchical and networked architectures, to the first relational style systems like IBM’s
Boolean (yes/no)- This is a data type that restricts the record to only two choices, either yes or no.
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
100GB-TB, the data type is usually historical, summarized, multidimensional, integrated and consolidated. OLAP use complex query, and the underline structure is cube, “in relational database systems, OLAP cubes are constructed from a fact table and one or more-dimension tables [1]. It is ad-hoc, and not for everyday repetitive use. The users for OLAP system maybe about hundreds of people, but the records can be access can reach millions, comparing to OLTP which can have thousands of users and only can access tens of records.
As there is a rise in data volumes, the manageability of data and storing these huge volumes of data became a cause of concern to most of the organizations. It was during this period when Number of SQL or more popularly NoSQL was introduced, to process these large amounts of data efficiently and effectively. For this purpose, various Data Store categories were developed, based on the different data models. Some of the categories are:
Three kinds of data need to be stored: fact table data (the transactional records), aggregates, and dimensions. The multidimensional format does not perform well when there are more than few dimensions which will lead to sparse data. A HOLAP , a hybrid OLAP system solves this problem by only storing the aggregates in multidimensional format.