Assignment-3_F2023
.Rmd
keyboard_arrow_up
School
Toronto Metropolitan University *
*We aren’t endorsed by this school
Course
830
Subject
Statistics
Date
Jan 9, 2024
Type
Rmd
Pages
7
Uploaded by yusrafq
---
title: "CIND 123: Data Analytics Basic Methods: Assignment-3"
output: html_document
---
<center> <h1> Assignment 3 (10%) </h1> </center>
<center> <h2> Total 100 Marks </h2> </center>
<center> <h3> [Insert your full name] </h3> </center>
<center> <h3> [Insert course section & student number] </h3> </center>
---
## Instructions
This is an R Markdown document. Markdown is a simple formatting syntax for
authoring HTML, PDF, and MS Word documents. For more details on using R Markdown
see <http://rmarkdown.rstudio.com>.
Use RStudio for this assignment. Complete the assignment by inserting your R code
wherever you see the string "#INSERT YOUR ANSWER HERE".
When you click the **Knit** button a document will be generated that includes both
content as well as the output of any embedded R code chunks within the document.
You can embed an R code chunk like this:
Submit **both**
the rmd and generated output files. Failing to submit both files
will be subject to mark deduction.
## Sample Question and Solution
Use `seq()` to create the vector $(2,4,6,\ldots,20)$.
```{r}
#INSERT YOUR ANSWER HERE.
seq(2,20,by = 2)
```
## Question 1 [15 Pts]
a) [5 Pts] First and second midterm grades of some students are given as
c(85,76,78,88,90,95,42,31,66) and c(55,66,48,58,80,75,32,22,39). Set R variables
`first` and `second` respectively. Then find the least-squares line relating the
second midterm to the first midterm.
Does the assumption of a linear relationship appear to be reasonable in this
case? Give reasons to your answer as a comment.
```{r}
#INSERT YOUR ANSWER HERE.
first <- c(85,76,78,88,90,95,42,31,66)
second <- c(55,66,48,58,80,75,32,22,39)
least_squares <- lm(second ~ first)
summary(least_squares)
plot(first, second, main = "Midterm Grades", xlab="First Midterm", ylab = "Second
Midterm")
abline(least_squares, col="blue")
#A linear relationship assumption can be examined using the scatterplot and the
summary statistics provided by the linear regression model.
#The scatterplot has a clear linear pattern and it does suggest a linear
relationship between the variables
#For the linear regression model, the R-squared value in the summary need to be
checked. The "estimate" values are as follows: intercept = -4.1516, first = 0.7870.
The least-squares line equation is: second midterm = 0.7870(first midterm) -
4.1516, which is a fairly good fit for the data.
```
b) [5 Pts] Plot the second midterm as a function of the first midterm using a
scatterplot and graph the least-square line in red color on the same plot.
```{r}
#INSERT YOUR ANSWER HERE.
first <- c(85,76,78,88,90,95,42,31,66)
second <- c(55,66,48,58,80,75,32,22,39)
least_squares <- lm(second ~ first)
plot(first, second, main="Midterm Grades", xlab="First Midterm", ylab="Second
Midterm")
abline(least_squares, col = "blue")
```
c) [5 Pts] Use the regression line to predict the second midterm grades when the
first midterm grades are 81 and 23.
```{r}
#INSERT YOUR ANSWER HERE.
first <- c(85,76,78,88,90,95,42,31,66)
second <- c(55,66,48,58,80,75,32,22,39)
least_squares <- lm(second ~ first)
first_grade <- c(81, 23)
prediction <- predict(least_squares, data.frame(first = first_grade))
prediction
```
## Question 2 [45 Pts]
This question makes use of package "plm". Please load Crime dataset as follows:
```{r load_packages}
#install.packages("plm")
library(plm)
data(Crime)
```
a) [5 Pts] Display the first 8 rows of 'crime' data and display the names of all
the variables, the number of variables, then display a descriptive summary of each
variable.
```{r}
#INSERT YOUR ANSWER HERE.
library(plm)
data(Crime)
head(Crime, 8)
names(Crime)
length(names(Crime))
summary(Crime)
```
b) [5 Pts] Calculate the mean,variance and standard deviation of probability of
arrest (prbarr) by omitting the missing values, if any.
```{r}
#INSERT YOUR ANSWER HERE.
mean_arrest <- mean(Crime$prbarr, na.rm=TRUE)
variance_arrest <- var(Crime$prbarr, na.rm=TRUE)
std_arrest <- sd(Crime$prbarr, na.rm=TRUE)
cat("Mean:", mean_arrest, "\n")
cat("Variance:", variance_arrest, "\n")
cat("Standard Deviation:", std_arrest, "\n")
```
c) [5 Pts] Use `lpolpc` (log-police per capita) and `smsa` variables to build a
linear regression model to predict probability of arrest (prbarr).
And, compare
with another linear regression model that uses `polpc` (police per capita) and
`smsa`.
[5 Pts] How can you draw a conclusion from the results?
(Note: Full marks requires comment on the predictors)
```{r}
#INSERT YOUR ANSWER HERE.
model_one <- lm(prbarr ~ lpolpc + smsa, data = Crime)
model_two <- lm(prbarr ~ polpc + smsa, data = Crime)
summary(model_one)
summary(model_two)
#Model One: (prbarr ~ lpolpc + smsa). The coefficient for lpolpc indicates how much
the probability of arrest changes for one unit change in the log-police per capita.
The coefficient smsa indicates how much the probability of arrest changes for areas
within a standard metropolitan statistical area compared to non-metropolitan areas
#Model Two: (prbarr ~ polpc + smsa). The coefficient for polpc indicates how much
the probability of arrest changes for one unit change in police per capita.The
coefficient smsa indicates how much the probabiliy of arrest changes for areas
within a standard metropolitan statistical area compared to non-metropolitan areas.
#Model Two seems to perform better than Model One based on the various metrics like
lower residual standard error (model one=0.1623, model two=0.161), higher multiple
R-squared (model one=0.104, model two=0.1189), higher adjusted R-squared (model
one=0.1012, model two=0.1161), and higher F-statistic (model one=36.4, model
two=42.31)
#Comments on Predictors:
#In Model Two, the coefficent for polpc is 18.34603, suggesting that a one-unit
increase in police per capita is associated with a substantial increase in the
probability of arrest. The inclusion of polpc in Model Two appears to provide a
better fit to the data.
#The coefficient for smsa in both models is negative, suggesting that being in a
standard metropolital statistical area is associated with a decrease in the
probability of arrest compred to non-metropolitan areas.
```
d) [5 Pts] Based on the output of your model, write the equations using the
intercept and factors of `smsa` when `polpc` is set to 0.0015. and compare the
result with `predict()` function.
Hint: Explore `predict()` function
```{r}
#INSERT YOUR ANSWER HERE.
polpc_val <- 0.0015
model_two <- lm(prbarr ~ polpc + smsa, data=Crime)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary.
Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables?
Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA.
At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables?
GPA
Salary
2.21
71000
2.28
49000
2.56
71000
2.58
63000
2.76
87000
2.85
97000
3.11
134000
3.35
130000
3.67
156000
3.69
161000
arrow_forward
What is primary and secondary data
arrow_forward
Define Association of attributes.
arrow_forward
Create new column names for each column in the new data frame created in step 2
arrow_forward
If a 3x3 table is presented, then you know that a study used __ independent variables each with __ categories.
arrow_forward
What type of Data is being shown
arrow_forward
compute the range for the set of data
13, 13, 13, 20, 24, 24, 24
arrow_forward
What type of association is shown in the scatter plot?
arrow_forward
Example: If a student Vishnu scored 45/50 on exam-1, 92/100 on exam-2 and 55.5/100 on the exam-3, the
complete list of data looks like 105, 82, 94.5, 72.5, 92, 91, 52, 86, 100, 96, 98, 109, 96, 90, 92, 55.5 which is the
16 data points Vishnu uses for this project.
IMPORTANT: Assume that the complete list of 16 scores as scores of 16 different students in a
class and answer the questions below.
Q1. What is the sample size of your data?
Qualitative
Quantitative
Neither
Discrete
Continuous
Neither
Nominal
Ordinal
Interval
Ratio
Q2. Is the data of scores qualitative or quantitative?
Q3. Is the data of scores discrete or continuous?
Q4. What is the level of measurement for this data?
arrow_forward
BDO always has two tellers on duty. Customers arrive to receive service from a teller at a mean rate of 40 per hour. A teller requires an average of 2 minutes to serve a customer. When both tellers are busy, an arriving customer joins a single line to wait for service.
Describe the queuing system using the notation
Determine the basic measures of performance—Wq, W, Lq, and L—for this queueing system.
arrow_forward
what is discriminant ?
arrow_forward
How to find the weighted mean of three models
arrow_forward
What are the major categories of business analytics
Select all that apply.
A. Predictive analytics
B. Prescriptive analytics
c. Data mining
D. Descriptive analytics
E. Association methods
F. Clustering methods
O G. Classification methods
arrow_forward
858_1&content_id%3D
olicaciones M Gmail
A Maps
A Noticias
GTraducir
Question Completion Status:
Brand A, Brand B, and Brand C sold a number of items each month in 2019. Each brand described their item sales in the box plots shown below. For which
of these brands would you expect that the mean would be less than the median?
Brand C
Brand B
Brand A
500
1000
1500
2000
Brand A and Brand B
Brand B and Brand C
O Brand A and Brand C
O None of these box plots suggest that the mean would be less than the median.
Save All
Click Save and Submit to save and submit. Click Save All Answers to save all answers.
* Relati
Reading - Mappi..pdf A
Worksheet - Py....docx
W
Worksheet - ....docx
MLK Letter -2.pdf
感tv
DIC.
11
arrow_forward
Please help me with this question.
arrow_forward
Answer all will give upvote if correct in our checker
arrow_forward
23. Explain the methods used to measure the association between variables.
arrow_forward
COMPUTE THE COEFFICIENT R, DESCRIBE THE 2 CHOSEN VARIABLE (CHOOSE 2 VARIABLES)
arrow_forward
how can a meaningful graph be made for data that has the number of cases made each day for one month at a drink manufacturer
arrow_forward
Data collection methods can only be used in scientific research
(A) True
B False
arrow_forward
PLEASE PROVIDE SOLUTION ASAP AND SELECT CORRECT OPTION ASAP FOR EACH ONE
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Related Questions
- College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary. Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables? Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA. At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables? GPA Salary 2.21 71000 2.28 49000 2.56 71000 2.58 63000 2.76 87000 2.85 97000 3.11 134000 3.35 130000 3.67 156000 3.69 161000arrow_forwardWhat is primary and secondary dataarrow_forwardDefine Association of attributes.arrow_forward
- compute the range for the set of data 13, 13, 13, 20, 24, 24, 24arrow_forwardWhat type of association is shown in the scatter plot?arrow_forwardExample: If a student Vishnu scored 45/50 on exam-1, 92/100 on exam-2 and 55.5/100 on the exam-3, the complete list of data looks like 105, 82, 94.5, 72.5, 92, 91, 52, 86, 100, 96, 98, 109, 96, 90, 92, 55.5 which is the 16 data points Vishnu uses for this project. IMPORTANT: Assume that the complete list of 16 scores as scores of 16 different students in a class and answer the questions below. Q1. What is the sample size of your data? Qualitative Quantitative Neither Discrete Continuous Neither Nominal Ordinal Interval Ratio Q2. Is the data of scores qualitative or quantitative? Q3. Is the data of scores discrete or continuous? Q4. What is the level of measurement for this data?arrow_forward
- BDO always has two tellers on duty. Customers arrive to receive service from a teller at a mean rate of 40 per hour. A teller requires an average of 2 minutes to serve a customer. When both tellers are busy, an arriving customer joins a single line to wait for service. Describe the queuing system using the notation Determine the basic measures of performance—Wq, W, Lq, and L—for this queueing system.arrow_forwardwhat is discriminant ?arrow_forwardHow to find the weighted mean of three modelsarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning