Laboratory 4 WINTER (2)-8851426
.docx
keyboard_arrow_up
School
Conestoga College *
*We aren’t endorsed by this school
Course
2000
Subject
Information Systems
Date
Apr 3, 2024
Type
docx
Pages
12
Uploaded by MagistrateMusic9935
Laboratory Exercise 5 - HIVE
In this lab you will leverage from the explanations provided during the Hive Theory Class and experience the Map Reduce Process through HIVE. Exercise 1 – Run the Below to obtain the maximum price of stocks
IMPORTANT: yourfirstname is not the name of the table – You must use your first name for this exercise. 1.
Create a Table in Hive
CREATE EXTERNAL TABLE IF NOT EXISTS yourfirstname
(
exch STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hirw/input/stocks';
2.
Verify if table was created
SHOW TABLES
Laboratory Exercise 5 - HIVE
3.
Limit of 25 rows
SELECT * FROM yourfirstname
LIMIT 25;
Tables in Hive, an infrastructure for data warehouses built on top of Hadoop, are typically developed with more descriptive names that correspond to the data they store. All columns in the designated table are retrieved using the SELECT * query. The output is then limited to the top 25 rows using the LIMIT 25 clause.
Laboratory Exercise 5 - HIVE
Capture one screenshot once the job runs – Screenshot 1
4.
Create a Query that will display the maximum price of stock open, grouped by symbol SELECT symbol, max(price_close) max_close FROM yourfirstname
GROUP BY symbol;
The purpose of the provided SQL query is to retrieve data from the "yourfirstname" table in a Hive database. The minimal closing price (min(price_close))
is determined by the query for every distinct value found in the "symbol" field. Using the GROUP BY clause, the result set is sorted by the "symbol" column to produce a succinct summary of the lowest closing prices linked to each unique symbol in the given database. When analysing and retrieving important statistical data for various symbols in financial or related datasets, this kind of query is helpful.
Capture one screenshot once the job runs – Screenshot 2 (PLEASE RETAIN SCREENSHOT EVEN IF AN ERROR OCCURS)
Laboratory Exercise 5 - HIVE
Exercise 2 – Elaborate some query modifications
1.
Based on Step 4 on Exercise 1, modify the query to execute a job that obtains the minimum price at price open.
SELECT symbol, min(price_close) min_close FROM yourfirstname
GROUP BY symbol;
The "yourfirstname" table is the source of data for this Hive query, which also determines the minimum closing price (min(price_close)) for each distinct value in the "symbol" column. By grouping the data according to the unique values in the "symbol" column, the result set is arranged using the GROUP BY clause. This query gives a clear summary of the minimum closing prices linked to various symbols in the given Hive table, making it especially helpful for summarising and evaluating financial or time-series data. Hive is a Hadoop-based data warehousing system that allows distributed queries in a language like SQL to be run on big datasets.
Capture one screenshot once the job runs – Screenshot 3
Laboratory Exercise 5 - HIVE
2.
Delete the table you created for both Exercise 1 and Exercise 2
DROP Table yourfirstname
Exercise 3 – Full Case Study
Access the Hive terminal and follow the instructions for each case study below:
Case Study: Sales Data Analysis
In this lab, you will work with a dataset containing sales data for a company. The dataset includes information such as transaction ID, product name, quantity sold, and sales amount. Your goal is to perform basic data querying tasks using Apache Hive to gain insights from the sales data.
Scenario:
You have been provided with a dataset that contains sales data for a company. The dataset includes information about the transactions such as the transaction ID, product name, quantity sold, and sales amount. The company wants to analyze the sales data to identify the top-selling products, calculate the total sales amount, and find out the average quantity sold.
Important >> The database name should be unique, otherwise this is not going to work!
Lab Steps:
1.
Create a database: Start by creating a new database named yourfirstname to store the sales data.
Answer:
CREATE DATABASE IF NOT EXISTS yourfirstname
; 2.
Evoke the use of the database by applying the command USE:
Answer: USE yourfirstname;
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help