Developing A Diabetes Detection System by Using Big Data Technology 4 Problem Solution In this system, the performance of CBR Algorithm will be boosted based on MapReduce approach and to detect diabetes of a particular patient with improved CBR algorithm by using Apache Hadoop framework. Fig. 1. Framework of Case Based Reasoning Algorithm The biggest challenge of a CBR is finding an accurate indexing function. In diabetes dataset, there are five variables that influence the glycosylated
• What is data science? Data science is the study of where information come from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. According to IBM estimation, what is the percent of the data in the world today that has been creates in the past two years According to the IBM estimation 90 percent of the data has been created • What is the value of petabyte storage? Petabyte is the term is used to describe the capacity of storage or
devices in 2016 alone [2], unprecedented amount of data is being generated and processed daily and increasingly every year. With the advent of web 2.0, the growth and creation of new and more complex types of data has created a natural demand for analysis of new data sources in order to gain knowledge. This new data volume and complexity is being called Big Data, famously characterised by Volume, Variety and Velocity and has created data management and processing challenges due to technological limitations
Introduction With 3.2 billion internet users [1] and 6.4 billion internet connected devices by 2016 [2], unprecedented amount of data is being generated and process daily and increasing every year. The advent of web 2.0 has fueled the growth and creation of new and more complex types of data which creates a natural demand to analyze new data sources in order to gain knowledge. This new data volume and complexity of the data is being called Big Data, famously characterised by Volume, Variety and Velocity;
Introduction With 3.2 billion internet users [1] and 6.4 billion internet connected devices by 2016 [2], unprecedented amount of data is being generated and process daily and increasing every year. The advent of web 2.0 has fueled the growth and creation of new and more complex types of data which creates a natural demand to analyze new data sources in order to gain knowledge. This new data volume and complexity of the data is being called Big Data, famously characterised by Volume, Variety and Velocity;