final 5080_2024
.docx
keyboard_arrow_up
School
Austin Peay State University *
*We aren’t endorsed by this school
Course
5080
Subject
Computer Science
Date
Jan 9, 2024
Type
docx
Pages
6
Uploaded by JusticeGalaxyHorse20
Q1. For the following 200, 400, 800, 1000, 2000
1. Calculate the mean and variance
2. Normalize the above group by min-max min=0, max=10
3. In z-score normalization what value should the first 200 be transformed to?
1. Mean = 880, Variance= 492,000
2. Min_max= 0, Max=10
Normalized by Min_Max = (X - Min /(Max-Min)) * (New_Max – New_Min) + New_Min
(X-Min/(Max-Min) x (10 – 0) + 0
(200 – 0/2000-200) x (10 – 0) + 0 = 1.11
Value
Min_Max
200
0
400
1.11
800
3.33
1000
4.44
2000
10
3. Z-score normalization
Z-Score= x – mean/ σ
2
For the value (200) =
200
−
880
√
492,000
= -0.969452112
Q2. A database has four transactions. Let
min sup
= 60% and
min conf
= 80%.
(a) At the granularity of
item category
(e.g.,
item
i
could be “
Milk"
), for the following rule
template, [s,c]
ꓯꓯ
transaction
ϵ
, buys
(
X, item
1
)
^ buys
(
X, item
2
) =>
buys
(
X, item
3
) [
s; c
]
list the frequent
k
-itemset for the largest
k
and
all
of the
strong
association rules (with their
support
s
and confidence
c
) containing the frequent
k
-itemset for the largest
k
.
Answer:
The value of k =3 and the frequent 3-itemset is {Bread, Milk, Cheese}.
These are the Rules.
Bread ^ Cheese => Milk, [75%, 100%]
Cheese ^ Milk => Bread, [75%, 100%]
Cheese => Milk ^ Bread, [75%, 100%]
(b) At the granularity of
brand-item category
(e.g.,
itemi
could be
\Sunset-Milk"
), for the
following rule template, [s, c]
customer
; buys
(
X; item
1)
^ buys
(
X; item
2)
)buys
(
X; item
3) list the frequent
k
-itemset for the largest
k
.
The maximum value of k =3.
The frequent 3-itemset includes {(Wonder-Bread, Dairyland-Milk, Tasty-Pie), (Wonder-Bread,
Sunset-Milk, Dairyland-Cheese)}.
Q3. Suppose you are requesting to classify microarray with 100 tissues and 10000 genes. Which of the
following algorithms would you recommend and why?
1. Decision tree induction
2. Piece-wise linear regression
3. SVM
4. Associative classification
5. Genetic algorithm
6. Bayesian clief network
Answer:
I suggest opting for Support Vector Machines (SVM) because this algorithm is adept at
accommodating penalties arising from the adverse impact of genes on the classification process.
Additionally, SVM excels in managing high-dimensional data, addressing non-linear relationships,
and delivering distinct separations with the incorporation of penalties for misclassifications.
Q4.
The Table below shows the Decision…..see text
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
The proportion of variance explained by a principal component (PVE) is the variance of this principal component over the total variance of the dataTrueFalse
arrow_forward
Find the dissimilarity matrix of the following dataset:
Student
Test-1 (nominal)
Test-2 (nominal)
Test-3 (ordinal)
Test-4 (numeric)
1
Code-A
Code - I
Excellent
80
Code-B
Code - I
Fail
20
3
Code-C
Code - II
Fail
100
4
Code-A
Code - II
Pass
60
arrow_forward
Given the following values (200, 400, 800, 1000, 2000):
Calculate their mean, variance, and standard deviation.
Normalize them using min-max normalization with and .
Normalize them using z-score.
arrow_forward
calculate mean median and mode for the following data pertaining to marks in statistics out of 140 marks for 80 students in a class
arrow_forward
Find out the transformed value ?
Arthemetic mean is 32 and standard deviation is 24 then value 70 is transformed , what is that value using zero - score normalization.
arrow_forward
Which one of the following statistics is resistant to outliers?
(A) mean
B) range
Q1
D standard deviation
arrow_forward
Q3
arrow_forward
The R function for logistic regression is:
Question 21 options:
logit
exp
lm
glm
arrow_forward
Which of the following correctly represents the coefficient of determination in terms of the variance that is an output from the analysis of variance table?
arrow_forward
Which of the following statements are true for bias and variance
O Classifiers with low bias typically have low variance
Classifiers with low bias typically have high variance
Bias and variance typically do not have any dependency on each other
arrow_forward
Software randomization: how? How should rand function values be resized or shifted?
arrow_forward
Calculate the Theil index and the Atkinson index (use = 1) of house-hold income for all households and by race (in other words, calculate the inequality measures within racial categories).
what would be the code for the above question in stata?
arrow_forward
What is the source of variation when analysing Bootstrap estimates, say 1000 estimates of R squared?
arrow_forward
8. Commuting Times The U.S. Census Bureau reports that the average commuting time for citizens of both Baltimore, Maryland, and
Miami, Florida, is approximately 29 minutes. To see if their commuting times appear to be any different in the winter, random
samples of 40 drivers were surveyed in each city and the average commuting time for the month of January was calculated for both
cities. The results are shown. At the 0.05 level of significance, can it be concluded that the commuting times are different in the
winter?
Miami
Baltimore
Sample size
40
40
Sample mean
28.5 min 35.2 min
Population standard deviation
7.2 min
9.1 min
arrow_forward
For interval data, what correlation coefficient would you use?
arrow_forward
If consumption_expenditures is given for the official poverty line, how can we calculate the official poverty line by age and sex in STATA?
what is the code.
arrow_forward
Please help step by step with explanation for Program R (CS) with a final code for understanding thank you.
arrow_forward
What Is The sample linear correlation?
arrow_forward
Please use MATLAB
arrow_forward
ce_S1_Mangesh
/ My courses/55 ITSE415_AppDataSci_S1 / Chapter 4- Data Preprocessing / Quiz 2 - 10 June 2021
Quiz navigation
When we try to analyze the result-data of the last 10 semesters for the subject 'Programming-1' and if you are asked
to find what will be the final score of any new student based on the value of his Quiz 2 marks, then what type of
analysis is this?
of
Finish attempt-
O a. Prescriptive Analysis
Time left 0:17:27
tion
O b. None of these
Oc. Predictive Analysis
O d. Descriptive Analysis
Next page
Jump to..
pter 4- Presentation
arrow_forward
Given that the mean for a data set is 24.8 and the standard deviation is 2.3, find the z-score of the number 27 in this set.
Group of answer choices
-0.96
0.96
0.83
1.00
arrow_forward
PLEASE USE RSTUDIO
A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 6cars is selected, and the data recorded.
Car Odometer Price1 37388 146362 44758 141223 45833 140164 30862 155905 31705 155686 34010 14718
a.Identify the independent and dependent variables.
b.Look for the least squares regression line and R-squared.
c.Predict a response variable at your choice of predicted variable that is not in the given with corresponding confidence interval.
d.Create a scatter plot with the regression line and labels.e.Is the regression line a better fit?
PLEASE USE RSTUDIO
arrow_forward
In using KNN for regression, the predicted value is the
neighbours
of its K nearest neighbours, while for classification it is the
of its K nearest
arrow_forward
pandas python how to check for outliers for the culomn variable. subset the dataframe given z-score>3 or z-score <-3 and save it in dataframe named displacement_outliers. print the outliers dataframe
arrow_forward
WHAT IS R LANGUANGE COMMAND FOR THIS QUESTION?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Related Questions
- The proportion of variance explained by a principal component (PVE) is the variance of this principal component over the total variance of the dataTrueFalsearrow_forwardFind the dissimilarity matrix of the following dataset: Student Test-1 (nominal) Test-2 (nominal) Test-3 (ordinal) Test-4 (numeric) 1 Code-A Code - I Excellent 80 Code-B Code - I Fail 20 3 Code-C Code - II Fail 100 4 Code-A Code - II Pass 60arrow_forwardGiven the following values (200, 400, 800, 1000, 2000): Calculate their mean, variance, and standard deviation. Normalize them using min-max normalization with and . Normalize them using z-score.arrow_forward
- calculate mean median and mode for the following data pertaining to marks in statistics out of 140 marks for 80 students in a classarrow_forwardFind out the transformed value ? Arthemetic mean is 32 and standard deviation is 24 then value 70 is transformed , what is that value using zero - score normalization.arrow_forwardWhich one of the following statistics is resistant to outliers? (A) mean B) range Q1 D standard deviationarrow_forward
- Which of the following statements are true for bias and variance O Classifiers with low bias typically have low variance Classifiers with low bias typically have high variance Bias and variance typically do not have any dependency on each otherarrow_forwardSoftware randomization: how? How should rand function values be resized or shifted?arrow_forwardCalculate the Theil index and the Atkinson index (use = 1) of house-hold income for all households and by race (in other words, calculate the inequality measures within racial categories). what would be the code for the above question in stata?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage