Apache Cassandra Ashish Penti
Computer Science Department
University of North Carolina - Charlotte Charlotte, NC, U.S.A
ABSTRACT
Apache Cassandra is another open source distributed database management system developed to handle large data sets. It is a gift given to the big data users by Facebook in 2008. The purpose of this research is to highlight the importance of Cassandra in the world of NoSQL by discussing about some of the main questions like, what are the difficulties faced with the traditional relational database management systems, how these issues are solved by using NoSQL, how Cassandra came into existence, why it is used by some of the major organizations to handle their data sets, etc.
In pursuit of finding reasons for
…show more content…
There are many NoSQL databases and each either have some common functionalities or have some unique functionalities when compared to the relational model. The main thing to be considered is that none of the NoSQL results work for all scenarios. Each works better than the relational models and satisfies some subsets of the use cases. Apache Cassandra is one of the NoSQL databases which is most widely used in the industrial market. This article gives a detailed information about Cassandra, its functionalities, its advantages and disadvantages which seem to be deceptive for someone who look at Cassandra for the first time.
1. INTRODUCTION
1.1. Difficulties with traditional RDBMS
In the initial stages of evolution of databases, relational databases systems was designed as a solution to the problems of flat file databases. A relational database stores data in multiple table. This technique helped to overcome the issues like data duplication, data noise and inconsistency which ensured that the data is entered and stored only once. Later as the data grew in size, it became a challenging task to handle such a significantly large amount of data. Key features like high data velocity, data variety, data volume and data complexity are few important reasons which the traditional database systems failed to handle successfully. As a result NoSQL came into
In order to overcome these limitations, a new database model known as Not Only SQL (NoSQL) database emerged with a set of new features. The main objective of NoSQL is not to discard SQL, but to be used as an alternative database data model for new features [1] [2] [3]. NoSQL database increases the performance of relational databases by a set of new characteristics and advantages. In contrast to relational databases, NoSQL databases introduced an additional feature that provides flexible and horizontal scalability and taking advantage of new clusters. The rise of NoSQL provides cost-effective management of data in modern web applications. With its new features, NoSQL can be used with applications that have a large transaction, and require low-latency access to huge datasets, service availability while
The wider insight about relational and non-relational database performance, particularly MySQL and Hadoop was gathered through the literature survey. By read textbooks, reviewing academic journals and research papers, I founded a gap in the performance of relational database compare to the non-relational.
Provide reasoning to support the use of the NoSQL database as the database of choice to solve the problem faced by TWC. Identify one strength and one weakness for each of the other three kinds of databases to solve the problem for TWC.
Some people believe that the answer to challenges posed by big data lie in a relatively new group of non-relational data storage and management products known collectively as NoSQL. However, NoSQL system development is different from traditional data warehouse development in that it is application driven. This has led some pundits to postulate that NoSQL represents a new paradigm in data warehouse design, where highly specialized data silos will replace the traditionally integrated data warehouse. Therefore it is reasonable to ask, should NoSQL be used to build big data warehouses? If yes, then should integration be discarded in favor of autonomous, application driven data silos?
Tracking the concept of Big Data management from Relational Databases Management Systems to the current NoSQL database, this paper surveys the Big Data challenges from the perspective of its characteristics Volume, Variety and Velocity, and attempts to study how each of these challenges are addressed by various NoSQL systems. NoSQL is not a single system that can solve every single Big Data problem; it is an eco-system of technologies where different type of NoSQL databases are optimized to address various types of big data challenges by providing schema-less modeling and automatic
STRUCTURE OF DATA: The data structure of a relational database comprises of table structure. Every table is identified by a unique name or label. The data tables are described as the collection of rows and columns. Each row of the table is known as the record and each column is known as the field of the specific data table. All the data sets are well organized and logical linked to each other through definite and unique relationships. A table, therefore can also be defined as the “structured collection of relationships”. The fundamental aim of developing No SQL database systems is to easily and effectively handle vast quantity of data or information in advanced web-scale applications. In order to achieve this purpose, the No SQL systems are designed as the schema-free database systems. There are different modes to define the No SQL databases that typically depend on the requirements of the data that has to be managed. The main No SQL data structures include column database, key-value store database, document store database, graph database and
NoSQL databases are a significant departure from the relational model that has dominated the business world for the past few decades. Standing for “Not Only SQL,” these products are all some variation of a non-relational, key-value pair database, and they are becoming very popular with companies that use Big Data and prioritize speed or availability over consistency of data.
NoSQL databases are databases designed to run on clusters of computers/servers, built for the ever-increasing data storage needs for websites. Devised as a way of scaling databases horizontally which is a challenge with traditional relational databases. Scaling horizontally is the ability to add more computers/servers as nodes to a database. These “clusters” work well with write-heavy systems and allow increase storage and processing power limited only by the number of connections you can have on the network. Defined as No-Schema, No-SQL data structures mean they are not limited to the original data structure. Objects and fields etc can be implemented at
NoSQL databases had made for unraveling the Big Data issue by utilizing a distributed system to bring out excellent performance in data storage and retrieval at very large-scale. At this scale, pieces of the system often fail and NoSQL is created to handle these failures (Chow, 2013) (Ron, Shulman-Peleg, & Bronshtein, 2015). Various companies have espouse different sorts of non-relational databases, ordinarily alluded to as
The paper provides background and related literature on the Big Data, studies the concept from Relational Database to current NoSQL database which have been fueled by the growth Big Data and importance of managing it. And surveys the Big Data challenges from the perspective of its characteristics Volume, Variety and Velocity and attempts to study how those challenges can be addressed.
The modern RDBMS advancements are not capable of supporting unstructured information with ideal space necessity. The plan winds up plainly mind-boggling and is henceforth troublesome for designers. The requirement for unstructured information administration is so annoying with conventional RDBMS arrangements (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). Moreover, RDBMS turns out to be an exorbitant answer for creating light-footed web applications with direct information investigation necessities. NoSQL is developing as a proficient possibility in this situation, which connects the issues related with RDBMS innovation. The market development can credit to creative dispatches of NoSQL arrangements, and collective endeavors by NoSQL sellers and clients. The endeavors of organizations, to enhance their market offerings, are creating the request of NoSQL, as a back-end bolster (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). The emergence of agile software development is creating the demand for NoSQL (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). They offer users much more avenues to accept data in many different forms. NoSQL is adaptable as SQL but offers many more uses that can apply to many organizations.
There is also a much talked about database called Cassandra which also needs to be discussed. It was originally developed by Facebook as open-sourced in 2008 [6]. Facebook was among the first to try the system for its inbox search system, which controls and stores in its disk space, and with the high performance of the system within its service level agreement requirements more applications like Netflix, Twitter etc. embraced Cassandra as their storage engine as well as backend for their streaming services [9]. What is Cassandra? Based on many definitions, Cassandra is a type of open source distributed database that is highly scalable, high performance designed to handle big amounts of data between many commodity servers that guarantees high availability without failure. Its main duty is high performance, also with its robust clusters among several data centers, as well as providing low latency operation for its various clients which is why businesses love it. It was written in Java language. Cassandra in accordance with research conducted on NoSQL systems concluded that its scalability, ability supersedes rest of the database management system with its largest number of nodes. Designed as a distributing system, which supports replication and multi replication as well as the ability to replace failed nodes without downtime [2]. Cassandra supports other open source like Hadoop, Apache Pig etc. It is similar with relational database since
Currently, a number of NoSQL Databases are used for different type of portals and these are specialized in handling heterogeneous and unstructured data.
In addition to its flexibility, these databases provides horizontal scalability and distributed computing that led to adoption of NoSQL databases in the firms. The SQL databases uses Structured Query Language whereas NOSQL databases use Unstructured Query Language which varies from database to database.
In Nowadays, there are two major of database management systems which are used to deal with data, the first one called Relational Database Management System (RDBMS) which is the traditional relational databases, it deals with structured data and have been popular since decades since 1970, while the second one called Not only Structure Query Language databases (NoSQL), they are dealing with semi-structured and unstructured data; the NoSQL types are gaining their popularity with the development of the internet and the social media since April 2009. NoSQL are intending to override the cons of RDBMs, such as fixed