Laboratory 4 WINTER (2)-8851426

.docx

School

Conestoga College *

*We aren’t endorsed by this school

Course

2000

Subject

Information Systems

Date

Apr 3, 2024

Type

docx

Pages

12

Uploaded by MagistrateMusic9935

Report
Laboratory Exercise 5 - HIVE In this lab you will leverage from the explanations provided during the Hive Theory Class and experience the Map Reduce Process through HIVE. Exercise 1 – Run the Below to obtain the maximum price of stocks IMPORTANT: yourfirstname is not the name of the table – You must use your first name for this exercise. 1. Create a Table in Hive CREATE EXTERNAL TABLE IF NOT EXISTS yourfirstname ( exch STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hirw/input/stocks'; 2. Verify if table was created SHOW TABLES
Laboratory Exercise 5 - HIVE 3. Limit of 25 rows SELECT * FROM yourfirstname LIMIT 25; Tables in Hive, an infrastructure for data warehouses built on top of Hadoop, are typically developed with more descriptive names that correspond to the data they store. All columns in the designated table are retrieved using the SELECT * query. The output is then limited to the top 25 rows using the LIMIT 25 clause.
Laboratory Exercise 5 - HIVE Capture one screenshot once the job runs – Screenshot 1 4. Create a Query that will display the maximum price of stock open, grouped by symbol SELECT symbol, max(price_close) max_close FROM yourfirstname GROUP BY symbol; The purpose of the provided SQL query is to retrieve data from the "yourfirstname" table in a Hive database. The minimal closing price (min(price_close)) is determined by the query for every distinct value found in the "symbol" field. Using the GROUP BY clause, the result set is sorted by the "symbol" column to produce a succinct summary of the lowest closing prices linked to each unique symbol in the given database. When analysing and retrieving important statistical data for various symbols in financial or related datasets, this kind of query is helpful. Capture one screenshot once the job runs – Screenshot 2 (PLEASE RETAIN SCREENSHOT EVEN IF AN ERROR OCCURS)
Laboratory Exercise 5 - HIVE Exercise 2 – Elaborate some query modifications 1. Based on Step 4 on Exercise 1, modify the query to execute a job that obtains the minimum price at price open. SELECT symbol, min(price_close) min_close FROM yourfirstname GROUP BY symbol; The "yourfirstname" table is the source of data for this Hive query, which also determines the minimum closing price (min(price_close)) for each distinct value in the "symbol" column. By grouping the data according to the unique values in the "symbol" column, the result set is arranged using the GROUP BY clause. This query gives a clear summary of the minimum closing prices linked to various symbols in the given Hive table, making it especially helpful for summarising and evaluating financial or time-series data. Hive is a Hadoop-based data warehousing system that allows distributed queries in a language like SQL to be run on big datasets. Capture one screenshot once the job runs – Screenshot 3
Laboratory Exercise 5 - HIVE 2. Delete the table you created for both Exercise 1 and Exercise 2 DROP Table yourfirstname Exercise 3 – Full Case Study Access the Hive terminal and follow the instructions for each case study below: Case Study: Sales Data Analysis In this lab, you will work with a dataset containing sales data for a company. The dataset includes information such as transaction ID, product name, quantity sold, and sales amount. Your goal is to perform basic data querying tasks using Apache Hive to gain insights from the sales data. Scenario: You have been provided with a dataset that contains sales data for a company. The dataset includes information about the transactions such as the transaction ID, product name, quantity sold, and sales amount. The company wants to analyze the sales data to identify the top-selling products, calculate the total sales amount, and find out the average quantity sold. Important >> The database name should be unique, otherwise this is not going to work! Lab Steps: 1. Create a database: Start by creating a new database named yourfirstname to store the sales data. Answer: CREATE DATABASE IF NOT EXISTS yourfirstname ; 2. Evoke the use of the database by applying the command USE: Answer: USE yourfirstname;
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help