DAT 640 Practical R Activity Eight
docx
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
640
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
1
Uploaded by DeaconSeaLionMaster750
DAT 640 Practical R Activity Eight
Performance Evaluation With an R Confusion Matrix
Overview:
During this activity, you will construct a classification algorithm and then add a confusion matrix to assist with model evaluation.
Instructions: Complete the lab activities below. Provide responses to the questions and screenshots when prompted. Please note: This assignment will be submitted and graded in Brightspace. Part 1
: Complete uCertify Lab 11.9.1 Analyzing Cost-benefit using Data-driven Misclassification Costs
. Take a screenshot illustrating successful execution of the lab R commands.
Part 2
: You will continue to utilize the uCertify lab environment for the second part of this assignment. Visit the R data website and choose a data set from the provided list. Then, choose and construct a classification algorithm on this data set, such as decision trees, random forest, or logistic regression. Then, you will build a confusion matrix and a ROC chart to evaluate the model. To complete this activity, follow the steps below:
1.
Visit R Data
and select one of the following data sets:
MASS birthwt
(Target: low [indicator of birth weight less than 2.5 kg])
Boot urine
(Target: r [Indicator of the presence of calcium oxalate crystals])
CarData TitanicSurvival
(Target: survived [Indicating they survived the incident])
Ecdat Car
(Target: choice [Vehicle of choice from six options])
Ecdat Fishing
(Target: mode [recreation fishing mode choice: beach, pier, boat, and charter])
Ecdat Katsup
(Target: choice [choice of brand of katsup {heinz, hunts, delmonte, and stb}])
2.
Describe the data set you chose and the algorithm used to predict the derived classes.
3.
Build a classification model using decision trees, logistic regression, or random forest. Provide screenshots of the confusion matrix and discuss the results.
4.
Provide screenshots of the ROC chart along with two paragraphs discussing the results. Include details on how the confusion matrix and ROC chart can assist with model evaluation.
The following websites may be of assistance in your submission:
A Beginner’s Guide to Learning R With the Titanic Dataset
Titanic Tutorial in R!
Confusion Matrix in R: A Complete Guide
How to Create a ROC Curve in R
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
The Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database:
FloorArea: square feet of floor space
Offices: number of offices in the building
Entrances: number of customer entrances
Age: age of the building (years)
AssessedValue: tax assessment value (thousands of dollars)
Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics.
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue?
Construct a scatter plot…
arrow_forward
Briefly describe the methods of collecting primary data
arrow_forward
Sahar Rasoul-Math 7 End of Yea X Gspy ninjas book-Google
docs.google.com/spreadsheets/d/1j5MotWzsc0V1V3Qyl4rbP_OFOUotaNXCIIFax>
Copy of Copy of Col...
8.8
Sahar Rasoul - Math 7 End of Year Digital Task Cards Student Version ☆
File Edit View Insert Format Data Tools Extensions Help Last edit was 5 minu
$ % .0 .00 123 Century Go... ▼ 18 Y BIS
fx| =IF(B4="Question 1", Sheet2! H21, if(B4="Question 2", Sheet2! H22, IF(B4="
n
100%
36:816
A
B
C
6
16
A flashlight can light
a circular area of up
to 6 feet in diameter.
What is the maximum
area that can be lit?
Round to the nearest
tenth.
30x
0004
15
A Sheet1
https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.amazon.com%2FSpy-Ninjas-Ultimate-Guidebook-Scholastic%2Fdp
7
8
9
10
11
12
13
14
3
5.
7.
a
5
$9
A
arrow_forward
I need some assistance solving Part B of this question. Refer to the excel data in the image provided to answer Part B. SoftBus Company sells PC equipment and customized software to small companies to help them manage their day-to-day business activities. Although SoftBus spends time with all customers to understand their needs, the customers are eventually on their own to use the equipment and software intelligently. To understand its customers better, SoftBus recently sent questionnaires to a large number of prospective customers. Key personnel—those who would be using the software—were asked to fill out the questionnaire. SoftBus received 82 usable responses, as shown in the file. You can assume that these employees represent a random sample of all of SoftBus's prospective customers. SoftBus believes it can afford to spend much less time with customers who own PCs and score at least 4 on PC Knowledge. Let's call these the "PC-savvy" customers. On the other hand, SoftBus believes it…
arrow_forward
Answer the questions in the picture please, explain each answer please! Thank you!
arrow_forward
Please help me with this. It only needs an answer.. Just this please.. You don't need to solve anymore.
*It deals largely with the summary calculations, graphical displays and describing important features of a set of data. It does not attempt to draw conclusions about anything that pertains to more than the data themselves.
a. Variableb. Samplingc. Descriptive Statisticsd. Inferential Statistics
*Linear Programming. It is an expression which shows the relationship between the variables in the problem and the firm’s goal. It is generally preceded by the word “maximize” or “minimize”.
a. Parameterb. Constraintc. Optimum Solutiond. Objective Function
arrow_forward
College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary.
Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables?
Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA.
At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables?
GPA
Salary
2.21
71000
2.28
49000
2.56
71000
2.58
63000
2.76
87000
2.85
97000
3.11
134000
3.35
130000
3.67
156000
3.69
161000
arrow_forward
Excel Exercise Week 5 Chapter 7 workbook.
Please upload your answers as an excel file with properly labeled answers. Please answer in excel only.
Data are collected in a clinical trial evaluating a new compound designed to improve wound healing in trauma patients. The new compound is compared against a placebo after treatment for 5 days with the new compound or placebo, the extent of wound healing is measured and the data are shown in table 7-6. Is there a difference in the extent of wound healing by treatment? (hint: are treatment and the percent wound healing independent?) Run the appropriate test at a 5% level of significance.
Percent Wound Healing
Treatment 0-25% 26-50% 31-75% 76-100%
New Compound (N=125) 15 37 32 41
Placebo(N=125) 36 45 34 10
Table 7-6
arrow_forward
Please answer as much of the question as possible. Also, please list the solution in the same format as the provided screenshot.
Thank you
arrow_forward
Explain when can we use data grouping?
arrow_forward
Can you answer A,B,C with clear answers. You can use the data in the second photo
arrow_forward
Learning math The Core Plus Mathematics Project
(CPMP) is an innovative approach to teaching Math-ematics that engages students in group investigations
and mathematical modeling. After field tests in 36 highschools over a three-year period, researchers comparedthe performances of CPMP students with those taughtusing a traditional curriculum. In one test, students had tosolve applied Algebra problems using calculators. Scoresfor 320 CPMP students were compared to those of a
control group of 273 students in a traditional Math pro-gram. Computer software was used to create a confidence
interval for the difference in mean scores. (Journal forResearch in Mathematics Education, 31, no. 3[2000])Conf level: 95% Variable: Mu(CPMP) – Mu(Ctrl)Interval: (5.573, 11.427)a) What’s the margin of error for this confidenceinterval?b) If we had created a 98% CI, would the margin of errorbe larger or smaller?c) Explain what the calculated interval means in context.d) Does this result suggest that students…
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
//$$/$/$/$::$/$:Helppppppp
arrow_forward
year- 2000,2005,2010,2014,2019
tuition-$1242,$1809,$2680,$3356,$5132
what is the equation for the least squares regression line?
what will tuitions be for the year 2025?
when will tuitions be $6500?
arrow_forward
7
arrow_forward
How Panel Data is useful to control some types of omitted variables without actually oberving them?
arrow_forward
please do with rstudio and provide all the codes.
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below.
team
league
price
wins
cost/win
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
Cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
NL
20.09
82
34.64
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
NL
26.73
89
48.69
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc

Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning

Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning

Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON

The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman

Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman
Related Questions
- Create a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forwardThe Excel file for this assignment contains a database with information about the tax assessment value assigned to medical office buildings in a city. The following is a list of the variables in the database: FloorArea: square feet of floor space Offices: number of offices in the building Entrances: number of customer entrances Age: age of the building (years) AssessedValue: tax assessment value (thousands of dollars) Use the data to construct a model that predicts the tax assessment value assigned to medical office buildings with specific characteristics. Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables? Use Excel’s Analysis ToolPak to conduct a regression analysis of FloorArea and AssessmentValue. Is FloorArea a significant predictor of AssessmentValue? Construct a scatter plot…arrow_forwardBriefly describe the methods of collecting primary dataarrow_forward
- Sahar Rasoul-Math 7 End of Yea X Gspy ninjas book-Google docs.google.com/spreadsheets/d/1j5MotWzsc0V1V3Qyl4rbP_OFOUotaNXCIIFax> Copy of Copy of Col... 8.8 Sahar Rasoul - Math 7 End of Year Digital Task Cards Student Version ☆ File Edit View Insert Format Data Tools Extensions Help Last edit was 5 minu $ % .0 .00 123 Century Go... ▼ 18 Y BIS fx| =IF(B4="Question 1", Sheet2! H21, if(B4="Question 2", Sheet2! H22, IF(B4=" n 100% 36:816 A B C 6 16 A flashlight can light a circular area of up to 6 feet in diameter. What is the maximum area that can be lit? Round to the nearest tenth. 30x 0004 15 A Sheet1 https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.amazon.com%2FSpy-Ninjas-Ultimate-Guidebook-Scholastic%2Fdp 7 8 9 10 11 12 13 14 3 5. 7. a 5 $9 Aarrow_forwardI need some assistance solving Part B of this question. Refer to the excel data in the image provided to answer Part B. SoftBus Company sells PC equipment and customized software to small companies to help them manage their day-to-day business activities. Although SoftBus spends time with all customers to understand their needs, the customers are eventually on their own to use the equipment and software intelligently. To understand its customers better, SoftBus recently sent questionnaires to a large number of prospective customers. Key personnel—those who would be using the software—were asked to fill out the questionnaire. SoftBus received 82 usable responses, as shown in the file. You can assume that these employees represent a random sample of all of SoftBus's prospective customers. SoftBus believes it can afford to spend much less time with customers who own PCs and score at least 4 on PC Knowledge. Let's call these the "PC-savvy" customers. On the other hand, SoftBus believes it…arrow_forwardAnswer the questions in the picture please, explain each answer please! Thank you!arrow_forward
- Please help me with this. It only needs an answer.. Just this please.. You don't need to solve anymore. *It deals largely with the summary calculations, graphical displays and describing important features of a set of data. It does not attempt to draw conclusions about anything that pertains to more than the data themselves. a. Variableb. Samplingc. Descriptive Statisticsd. Inferential Statistics *Linear Programming. It is an expression which shows the relationship between the variables in the problem and the firm’s goal. It is generally preceded by the word “maximize” or “minimize”. a. Parameterb. Constraintc. Optimum Solutiond. Objective Functionarrow_forwardCollege GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary. Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables? Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA. At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables? GPA Salary 2.21 71000 2.28 49000 2.56 71000 2.58 63000 2.76 87000 2.85 97000 3.11 134000 3.35 130000 3.67 156000 3.69 161000arrow_forwardExcel Exercise Week 5 Chapter 7 workbook. Please upload your answers as an excel file with properly labeled answers. Please answer in excel only. Data are collected in a clinical trial evaluating a new compound designed to improve wound healing in trauma patients. The new compound is compared against a placebo after treatment for 5 days with the new compound or placebo, the extent of wound healing is measured and the data are shown in table 7-6. Is there a difference in the extent of wound healing by treatment? (hint: are treatment and the percent wound healing independent?) Run the appropriate test at a 5% level of significance. Percent Wound Healing Treatment 0-25% 26-50% 31-75% 76-100% New Compound (N=125) 15 37 32 41 Placebo(N=125) 36 45 34 10 Table 7-6arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- MATLAB: An Introduction with ApplicationsStatisticsISBN:9781119256830Author:Amos GilatPublisher:John Wiley & Sons IncProbability and Statistics for Engineering and th...StatisticsISBN:9781305251809Author:Jay L. DevorePublisher:Cengage LearningStatistics for The Behavioral Sciences (MindTap C...StatisticsISBN:9781305504912Author:Frederick J Gravetter, Larry B. WallnauPublisher:Cengage Learning
- Elementary Statistics: Picturing the World (7th E...StatisticsISBN:9780134683416Author:Ron Larson, Betsy FarberPublisher:PEARSONThe Basic Practice of StatisticsStatisticsISBN:9781319042578Author:David S. Moore, William I. Notz, Michael A. FlignerPublisher:W. H. FreemanIntroduction to the Practice of StatisticsStatisticsISBN:9781319013387Author:David S. Moore, George P. McCabe, Bruce A. CraigPublisher:W. H. Freeman

MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc

Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning

Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning

Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON

The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman

Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman