What is big data analytics?
Big data analytics is the process of using advanced analyzing techniques against huge variant data sets to uncover hidden pattern or knowledge. It helps in organization's decision making process. Big data can be organized in any of the following formats.
- Structured data.
- Semi-structured data.
- Unstructured data.
Big data analytics involves applying advanced analysis techniques such as predictive modelling, what-if analysis and statistical methods. This helps organizations to make informed decisions and the best business moves nowadays.
Significance of big data analytics
- Improves business outcomes by means of data-driven decisions.
- Increases customer base through personalized ads and targeted marketing.
- Improves the efficiency business operations.
- Provides competitive advantage.
Big Data Analytics techniques
- Statistics involves performing various statistical operations on surveys and observational data to gather insights.
- Data mining techniques can be used to uncover hidden pattern from the data.
- Data fusion and integration involves merging data from multiple heterogeneous sources and gaining insights.
- Machine learning is provides predictions based on the knowledge obtained from the historical data.
- Natural Language Processing involves using AI algorithms which makes a computer to comprehend human languages.
- Data Science involves using predictive modelling technique to gather insights from unstructured and structured data.
Steps involved in big data analytics
- Business case analysis - It involves determining the goal or reason behind the data analysis.
- Data source Identification - Data from multiple heterogeneous sources are integrated into a single source.
- Data cleaning - It involves filtering unwanted data, filling in missing data and remove inconsistencies.
- ETL - Data is extracted, transformed into desirable format and then loaded onto a data store.
- Data aggregation - Data is summarized into the required level of abstraction.
- Data analysis - Various statistical techniques and mining tools are employed to derive useful insights from the raw data.
- Visualization - Power BI, R, Tableau are used to generate reports and graphical illustrations for easy understanding.
- Producing result of analysis - The final result is provided to the decision makers of the organization.
Characteristics of big data
Characteristics are an essential one that defines big data. The following are some of the characteristics of big data:
Volume: Refers to the amount of data gathered. Huge amount of data is collected on a day-to-day basis in the following applications.
- Online transactions.
- Sensors, Global Positioning System (GPS) Sensors, and telematics.
- Posts shared in social media applications such as Twitter, Facebook.
The data collected amounts to several Terabytes.
Velocity: Refers to the rate at which the data is captured.
Value: Defines the usage of data. Data captured remains raw which is of no use. It must be processed to obtain useful insights which helps businesses in their decision making process.
Variety: Defines the various formats of the captured data. The data can be presented in structured, semi-structured, and unstructured formats.
Veracity: Defines the quality of data. It refers to the amount of inconsistencies or errors that exist in the data.
Data formats
Structured data: Data is structured in the form of records which are according to some predefined schema. Examples include student database, employee database, order database and so forth
Unstructured data: Data is un-structured. Examples include information collected from websites, social media posts and so forth.
Semi-structured data: Semi-structured data toes the line between structured and unstructured. Most of the time, this translates to unstructured data with metadata attached to it.
Real-time data: Data collected at pre-defined intervals. For example, weather data, water-level determined by sensors, data transmitted by traffic sensors.
Metadata: Metadata is a data that holds information about another data. It has a detailed definition of the data and mapping information about the data.
Tools
- R programming language: Used to generate plots, graphs and word clouds which facilitates easy interpretation.
- Lumify: It is a data fusion, analysis and visualization platform used to generate dynamic histograms and geospatial views which can be modified and analyzed on-the-fly.
- Apache Spark: It is a framework used for carrying out big data analytics. It supports graph processing and ML.
- MongoDB: A No SQL database used to store huge volumes of data.
- Apache Hadoop: Used to store data in several distributed servers and can be used to run applications using cluster of nodes. The number of clusters can be increased or decreased according to the requirement.
Big data applications
- E-commerce - Product recommendation systems helps in targeted marketing and sales.
- Education - Used to improve courses and to provide customized learning environment.
- Healthcare - Helps in disease prediction and diagnosis based on historical data.
- Natural calamities prediction - Sensor data is used to record different parameters which can be used to predict flood, cyclone, tornado and so forth.
- Entertainment - Recommendation system which can recommend songs, videos and so forth based on user's past history.
- Banking and Insurance sector - Customer's spending pattern can be analyzed to target loans, credit card sales, insurance plans and so forth.
- Telecommunications - predict network traffic to allocate sufficient resources in order to improve QoS.
- Government - recommendations to improve government policies based on demographic data analysis.
Context and Applications
This topic is important for postgraduate and undergraduate courses, particularly for, Bachelors in computer science engineering and an associate of science in computer science.
Practice Problems
Question 1: Which V does not correspond to the 5 Vs of big data?
- Visualization
- Velocity
- Volume
- Veracity
Answer: Option A is correct.
Explanation: The five Vs correspond to variety, velocity, veracity, volume and value.
Question 2: Which is not a big data tool?
- Apache Tomcat
- Apache Hadoop
- Apache Spark
- R
Answer: Option A is correct.
Explanation: The other three are used in processing big data.
Question 3: Which is not a technique used in big data analytics?
- Statistical methods
- Predictive modelling
- Descriptive modelling
- Data transformation
Answer: Option D is correct.
Explanation: Data transformation involves applying some technique to transform the data to required format. It is not related to data analysis.
Question 4: What are the major components of big data?
- Map reduce
- HDFS
- Yarn
- All of the above
Answer: Option D is correct.
Explanation: All the options correspond to the components of big data.
Question 5: Which of the following are the features of data analytics technique?
- Scalability
- Open source
- Data recovery
- All the above
Answer: Option D is correct
Explanation: The data analytics technique must be capable of handling increased volume of data. It must be open source so that enhancements can be done when needed and it must preserve the original data.
Want more help with your computer science homework?
*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.