Assignment 1 Solved (1)

.docx

School

University Of Connecticut *

*We aren’t endorsed by this school

Course

5604

Subject

Information Systems

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by EarlMusicCrab33

Part 1 – Refer to page 52-55 of your textbook and answer the following problems:  2.1 o a – This is supervised learning because we are using historical data with a know outcome to train a model. From the data on prior customers, we know who is good at paying their loan off and who isn’t. We can use that information to predict who in the future will be a reliable loan customer and just approve those applicants for the loan. o b – Recommender systems are a type of unsupervised learning. We are simply observing purchasing patterns of previous customers to see what often goes together. o c – Supervised learning can help us form this prediction. We have existing data about the network data packets labeled as safe or dangerous. We can use this information to train a model to predict future network data packets as safe or dangerous. o d – Customer segmentation is another type of unsupervised learning. In segmentation we are looking for natural groupings of similar records. We are not predicting anything.  2.6 – You can’t use “refund issued” as a purchase prediction model, because that information isn’t known until after a customer already makes a purchase. You want to be able to make a purchase prediction before the potential customer acts. Part 2 – Use the attached ComicCharacters HW1.jmp or ComicCharacters WH.csv dataset to answer the following questions. You may complete this part in either JMP or Python. 1. Check the Height in cm column for outliers using Tail Quantile = 0.1 and Q = 3. How many outliers are there? What numeric threshold did they exceed to be called outliers? Which characters were identified? There are 10 characters with a height higher than 792 cm. The very tall characters are Anti- Monitor, Bloodwraith, Fin Fang Foom, Galactus, Giganta, Godzilla, King Kong, Surtur, Utgard-Loki, and Ymir.

2. Standardize the Height in cm column (using z-scores). Include a screen shot of the first 10 rows of your data. 3. Check the standardized height for outliers. Did these results differ from the previous outliers question? If so, how did it differ? If not, why not? The same 10 characters are identified as outliers. This is because standardizing the data doesn’t change the distribution of the data, just the scale.

4. Check the data quality of Intelligence, Strength, Speed, Durability, Power, and Combat. Do any of these variables have missing values? If so, how many? Include a screen shot showing your results. There are 78 missing values in each of these columns. Using the missing data pattern, we can see that each row either has all the data or all missing values for these six attributes. The missing values are not scattered across the dataset.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help