DAT 640 Practical R Activity Eight
.docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
640
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
1
Uploaded by DeaconSeaLionMaster750
DAT 640 Practical R Activity Eight
Performance Evaluation With an R Confusion Matrix
Overview:
During this activity, you will construct a classification algorithm and then add a confusion matrix to assist with model evaluation.
Instructions: Complete the lab activities below. Provide responses to the questions and screenshots when prompted. Please note: This assignment will be submitted and graded in Brightspace. Part 1
: Complete uCertify Lab 11.9.1 Analyzing Cost-benefit using Data-driven Misclassification Costs
. Take a screenshot illustrating successful execution of the lab R commands.
Part 2
: You will continue to utilize the uCertify lab environment for the second part of this assignment. Visit the R data website and choose a data set from the provided list. Then, choose and construct a classification algorithm on this data set, such as decision trees, random forest, or logistic regression. Then, you will build a confusion matrix and a ROC chart to evaluate the model. To complete this activity, follow the steps below:
1.
Visit R Data
and select one of the following data sets:
MASS birthwt
(Target: low [indicator of birth weight less than 2.5 kg])
Boot urine
(Target: r [Indicator of the presence of calcium oxalate crystals])
CarData TitanicSurvival
(Target: survived [Indicating they survived the incident])
Ecdat Car
(Target: choice [Vehicle of choice from six options])
Ecdat Fishing
(Target: mode [recreation fishing mode choice: beach, pier, boat, and charter])
Ecdat Katsup
(Target: choice [choice of brand of katsup {heinz, hunts, delmonte, and stb}])
2.
Describe the data set you chose and the algorithm used to predict the derived classes.
3.
Build a classification model using decision trees, logistic regression, or random forest. Provide screenshots of the confusion matrix and discuss the results.
4.
Provide screenshots of the ROC chart along with two paragraphs discussing the results. Include details on how the confusion matrix and ROC chart can assist with model evaluation.
The following websites may be of assistance in your submission:
A Beginner’s Guide to Learning R With the Titanic Dataset
Titanic Tutorial in R!
Confusion Matrix in R: A Complete Guide
How to Create a ROC Curve in R
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
Now monitor the process. An additional ten days of data have been collected, see table labeled “1st 10 Days of Monitoring Reservation Processing Time” in the Data File.
Develop Xbar and R charts for the 1st 10 days of monitoring. Plot the data for the 1st 10 days on the Xbar and R charts.
Is the process in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart.
Based on the X-bar and R Charts that you developed for the 1st 10 days of data, is the process in control?
Group of answer choices
No. The X-bar and R Charts are both out of control.
No. The X-bar Chart is in control, but the R Chart is out of control.
No. The R Chart is in control, but the X-bar Chart is out of control.
Yes. The X-bar and R Charts are both in control.
arrow_forward
Write a report of the different research design with exceptional of experimental research design.
arrow_forward
Sleep Late, a large hotel chain, has been using activity-based costing to determine the cost of a night’s stay at their hotels. One of the activities, “Inspection,” occurs after a customer has checked out of a hotel room. Sleep Late inspects every 10th room and has been using “number of rooms inspected” as the cost driver for inspection costs. A significant component of inspection costs is the cost of the supplies used in each inspection Mary Adams, the chief inspector, is wondering whether inspection labor-hours might be a better cost driver for inspection costs. Mary gathers information for weekly inspection costs, rooms inspected, and inspection labor-hours as follows:
Q.Plot the data and regression line for rooms inspected and inspection costs. Plot the data and regression line for inspection labor-hours and inspection costs. Which cost driver of inspection costs would you choose? Explain.
arrow_forward
The Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database:
FloorArea: square feet of floor space
Offices: number of offices in the building
Entrances: number of customer entrances
Age: age of the building (years)
AssessedValue: tax assessment value (thousands of dollars)
Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics.
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue?
Construct a scatter plot…
arrow_forward
Briefly describe the methods of collecting primary data
arrow_forward
The r code for side by side boxplot of vitamind v newage and vitamin d v country.
Scatterplot code for relationship between vitamin d level and age.
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Green Condition
Gender Too Fast
Male
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Female
Too Fast
10
Fine
Fine
40
Female Golfers
Total
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Fine
9
39 51
Total
Which group shows the highest percentage saying that the greens are too fast?
- Select your answer -
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For the low handicappers, the - Select your answer - have a higher percentage who…
arrow_forward
On a cold day in Minneapolis, the afternoon temperature was 48 degrees before a cold front moved through. As
the front moved through the temperature dropped an average of 5 degrees per hour for a total of 5 hours.
14
2/1
Identify the domain of the data set.
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Male
Green Condition
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Gender Too Fast Fine
Female
35
40
Too Fast
10
65
60
Fine
40
Total
100
100
Female Golfers
200
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Note: This exercise is an example of Simpson's Paradox.
39
Fine
9
Total
75
125
Which group shows the highest percentage saying that the greens are too fast?
Females, at 40%
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For…
arrow_forward
Just need help with part 3, thank you!
arrow_forward
Please help me
arrow_forward
(Please do not give solution in image format thanku)
arrow_forward
A survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A Click the icon to view the contingency table about social media use and marketers.
Contingency table
a. What is the probability that a randomly selected respondent plans to increase use of social media?
(Round to three decimal places as needed.)
Increase Use of
Social Media?
Business Focus
B2B
B2C
Total
b. What is the probability that a randomly selected respondent is a B2C marketer?
Yes
1,049
901
1,950
(Round to three decimal places as needed.)
No
284
768
1,052
Total
1,333
1,669
3,002
c. What is the probability that a randomly selected respondent plans to increase use of social media or is…
arrow_forward
tion 2 of 15
Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline
in dollars per gallon varied from state to state and are listed below.
$2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31
Click to download the data in your preferred format.
CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc
Calculate the range of the price of gas. Give your solution to the nearest cent.
range:
dollars per gallon
DELL
&
4.
7
8.
arrow_forward
The Minister responsible for Trade and Industry has assigned you the task of evaluating the improvement in productivity ofmanufacturing businesses in South Africa. Data for one of the businesses you are to evaluate is provided. The data are themonthly average of last year and the monthly average for this year. Determine the percentage change in multi-factorproductivity between the two years.Labor: R8 per hourCapital: 0.83% per month of investmentEnergy: R0.60 per BTU
arrow_forward
Excel Exercise Week 5 Chapter 7 workbook.
Please upload your answers as an excel file with properly labeled answers. Please answer in excel only.
Data are collected in a clinical trial evaluating a new compound designed to improve wound healing in trauma patients. The new compound is compared against a placebo after treatment for 5 days with the new compound or placebo, the extent of wound healing is measured and the data are shown in table 7-6. Is there a difference in the extent of wound healing by treatment? (hint: are treatment and the percent wound healing independent?) Run the appropriate test at a 5% level of significance.
Percent Wound Healing
Treatment 0-25% 26-50% 31-75% 76-100%
New Compound (N=125) 15 37 32 41
Placebo(N=125) 36 45 34 10
Table 7-6
arrow_forward
What is meant by strategic mapping, and why is this technique especially useful in healthcare strategic planning?
arrow_forward
The exercise involving data in this and subsequent sections were designed to be solved using Excel. Johnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow.
Repair Time(hours)
Months SinceLast Service
Type ofRepair
Repairperson
2.9
2
electrical
Dave Newton
3.0
6
mechanical
Dave Newton
4.8
8
electrical
Bob Jones
1.8
3
mechanical
Dave Newton
2.9
2
electrical
Dave Newton
4.9
7
electrical
Bob Jones
4.2
9
mechanical
Bob Jones
4.8
8
mechanical
Bob Jones
4.4
4
electrical
Bob Jones
4.5
6
electrical
Dave Newton
Ignore for now the months since the last maintenance service (x1 ) and the repairperson…
arrow_forward
The graph below shows response curves for three drugs, A, B, and C. The horizontal axis is dosage, and the vertical axis is response.
(Note: The horizontal green line on the graph marks the minimum desired response level. The horizontal red line on the graph marks the maximum safe
response level.)
eb1c07da-7833-3e6e-9020-d2aae79d619a_90a3...
A webwork.uits.iu.edu/webwork2_files/tmp/FA21-BL-MATH-M1...
(a) Which drug requires the largest dose for the desired response?
10
(Enter A, B, or С.)
The smallest dose?
(Enter A, B, or C.)
(b) Which drug has the largest maximum response?
Нах
B
A
(Enter A, B, or С.)
Hin
The smallest?
1,0
-1
10
(Enter A, B, or С.)
-1
(c) Assuming (as your textbook does) that drugs with broader ranges of safe and reliable dosages are safer to administer, which of these drugs is the safest to
administer?
(Enter A, B, or C.)
arrow_forward
Please do not give solution in image formate thanku.
arrow_forward
help B and C
arrow_forward
A group of people at the park were asked their ages, and the results can be downloaded from the data file Ages.
In StatKey, which menu option would you select under "Descriptive Statistics and Graphs" to graph the data?
One Quantitative
One Categorical
One Quantitative and One Categorical
A group of people at the park were asked their ages, and the results can be downloaded from the data file Ages.
Summarize this data by creating a histogram with StatKey, and submit your graph as a PDF. When creating your graph, please make sure the number of buckets is set to 10.
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
Identify several types of manufacturing companies for which process costing would be an appropriate product-costing system. What characteristics do the products of these companies have that would make process costing a good choice?
How is process costing similar and different in a second or later processing department?
arrow_forward
Read this same passage. And fill the boxes that what are number of rows and columns when the last question is given.
arrow_forward
please answer quickly
arrow_forward
Please help!!
From these following topics in CANADA, choose one of the 3, and formulate a research question, in which you could analyze the data.
Sawn wood: Sawn wood, production, deliveries and stocks by species
Weather: Weather data for Hamilton January 2020
Basketball - The Raptors: Statistics with the players
arrow_forward
7
arrow_forward
Not use ai please
arrow_forward
please do with rstudio and provide all the codes.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc
Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning
Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning
Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON
The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman
Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman
Related Questions
- Create a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forwardNow monitor the process. An additional ten days of data have been collected, see table labeled “1st 10 Days of Monitoring Reservation Processing Time” in the Data File. Develop Xbar and R charts for the 1st 10 days of monitoring. Plot the data for the 1st 10 days on the Xbar and R charts. Is the process in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart. Based on the X-bar and R Charts that you developed for the 1st 10 days of data, is the process in control? Group of answer choices No. The X-bar and R Charts are both out of control. No. The X-bar Chart is in control, but the R Chart is out of control. No. The R Chart is in control, but the X-bar Chart is out of control. Yes. The X-bar and R Charts are both in control.arrow_forwardWrite a report of the different research design with exceptional of experimental research design.arrow_forward
- Sleep Late, a large hotel chain, has been using activity-based costing to determine the cost of a night’s stay at their hotels. One of the activities, “Inspection,” occurs after a customer has checked out of a hotel room. Sleep Late inspects every 10th room and has been using “number of rooms inspected” as the cost driver for inspection costs. A significant component of inspection costs is the cost of the supplies used in each inspection Mary Adams, the chief inspector, is wondering whether inspection labor-hours might be a better cost driver for inspection costs. Mary gathers information for weekly inspection costs, rooms inspected, and inspection labor-hours as follows: Q.Plot the data and regression line for rooms inspected and inspection costs. Plot the data and regression line for inspection labor-hours and inspection costs. Which cost driver of inspection costs would you choose? Explain.arrow_forwardThe Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database: FloorArea: square feet of floor space Offices: number of offices in the building Entrances: number of customer entrances Age: age of the building (years) AssessedValue: tax assessment value (thousands of dollars) Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics. Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables? Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue? Construct a scatter plot…arrow_forwardBriefly describe the methods of collecting primary dataarrow_forward
- The r code for side by side boxplot of vitamind v newage and vitamin d v country. Scatterplot code for relationship between vitamin d level and age.arrow_forwardRecently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Green Condition Gender Too Fast Male Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Female Too Fast 10 Fine Fine 40 Female Golfers Total Green Condition Handicap Under 15 15 or more Too Fast 1 Fine 9 39 51 Total Which group shows the highest percentage saying that the greens are too fast? - Select your answer - b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For the low handicappers, the - Select your answer - have a higher percentage who…arrow_forwardOn a cold day in Minneapolis, the afternoon temperature was 48 degrees before a cold front moved through. As the front moved through the temperature dropped an average of 5 degrees per hour for a total of 5 hours. 14 2/1 Identify the domain of the data set.arrow_forward
- Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Male Green Condition Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Gender Too Fast Fine Female 35 40 Too Fast 10 65 60 Fine 40 Total 100 100 Female Golfers 200 Green Condition Handicap Under 15 15 or more Too Fast 1 Note: This exercise is an example of Simpson's Paradox. 39 Fine 9 Total 75 125 Which group shows the highest percentage saying that the greens are too fast? Females, at 40% b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For…arrow_forwardJust need help with part 3, thank you!arrow_forwardPlease help mearrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- MATLAB: An Introduction with ApplicationsStatisticsISBN:9781119256830Author:Amos GilatPublisher:John Wiley & Sons IncProbability and Statistics for Engineering and th...StatisticsISBN:9781305251809Author:Jay L. DevorePublisher:Cengage LearningStatistics for The Behavioral Sciences (MindTap C...StatisticsISBN:9781305504912Author:Frederick J Gravetter, Larry B. WallnauPublisher:Cengage Learning
- Elementary Statistics: Picturing the World (7th E...StatisticsISBN:9780134683416Author:Ron Larson, Betsy FarberPublisher:PEARSONThe Basic Practice of StatisticsStatisticsISBN:9781319042578Author:David S. Moore, William I. Notz, Michael A. FlignerPublisher:W. H. FreemanIntroduction to the Practice of StatisticsStatisticsISBN:9781319013387Author:David S. Moore, George P. McCabe, Bruce A. CraigPublisher:W. H. Freeman
MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc
Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning
Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning
Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON
The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman
Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman