Assignment 1 Solved (1)
.docx
keyboard_arrow_up
School
University Of Connecticut *
*We aren’t endorsed by this school
Course
5604
Subject
Information Systems
Date
Feb 20, 2024
Type
docx
Pages
5
Uploaded by EarlMusicCrab33
Part 1
– Refer to page 52-55 of your textbook and answer the following problems:
2.1
o
a – This is supervised learning because we are using historical data with a know outcome
to train a model. From the data on prior customers, we know who is good at paying their loan off and who isn’t. We can use that information to predict who in the future will be a reliable loan customer and just approve those applicants for the loan. o
b – Recommender systems are a type of unsupervised learning. We are simply observing purchasing patterns of previous customers to see what often goes together. o
c – Supervised learning can help us form this prediction. We have existing data about the network data packets labeled as safe or dangerous. We can use this information to train a model to predict future network data packets as safe or dangerous. o
d – Customer segmentation is another type of unsupervised learning. In segmentation we are looking for natural groupings of similar records. We are not predicting anything.
2.6 – You can’t use “refund issued” as a purchase prediction model, because that information isn’t known until after
a customer already makes a purchase. You want to be able to make a purchase prediction before
the potential customer acts. Part 2
– Use the attached ComicCharacters HW1.jmp or ComicCharacters WH.csv dataset to answer the following questions. You may complete this part in either JMP or Python. 1.
Check the Height in cm column for outliers using Tail Quantile = 0.1 and Q = 3. How many outliers are there? What numeric threshold did they exceed to be called outliers? Which characters were identified?
There are 10 characters with a height higher than 792 cm. The very tall characters are Anti-
Monitor, Bloodwraith, Fin Fang Foom, Galactus, Giganta, Godzilla, King Kong, Surtur, Utgard-Loki,
and Ymir.
2.
Standardize the Height in cm column (using z-scores). Include a screen shot of the first 10 rows of your data.
3.
Check the standardized height for outliers. Did these results differ from the previous outliers question? If so, how did it differ? If not, why not? The same 10 characters are identified as outliers. This is because standardizing the data doesn’t change the distribution of the data, just the scale.
4.
Check the data quality of Intelligence, Strength, Speed, Durability, Power, and Combat. Do any of
these variables have missing values? If so, how many? Include a screen shot showing your results. There are 78 missing values in each of these columns. Using the missing data pattern, we can see that each row either has all the data or all missing values for these six attributes. The missing values are not scattered across the dataset.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help