lab05-sol
.pdf
keyboard_arrow_up
School
Concordia University *
*We aren’t endorsed by this school
Course
6721
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
14
Uploaded by CaptainMaskWildcat29
COMP 6721 Applied Artificial Intelligence (Fall 2023)
Lab Exercise #5: Decision Trees & K-means
Solutions
Question 1
Given the training instances below, use information theory to find whether
‘Outlook’ or ‘Windy’ is the best feature to decide when to play a game of golf.
Outlook
Temperature
Humidity
Windy
Play / Don’t Play
sunny
85
85
false
Don’t Play
sunny
80
90
true
Don’t Play
overcast
83
78
false
Play
rain
70
96
false
Play
rain
68
80
false
Play
rain
65
70
true
Don’t Play
overcast
64
65
true
Play
sunny
72
95
false
Don’t Play
sunny
69
70
false
Play
rain
75
80
false
Play
sunny
75
70
true
Play
overcast
72
90
true
Play
overcast
81
75
false
Play
rain
71
80
true
Don’t Play
1
H
(
Output
) =
H
5
14
,
9
14
=
-
5
14
log
2
5
14
+
9
14
log
2
9
14
= 0
.
94
H
(
Output
|
sunny
) =
H
3
5
,
2
5
=
-
3
5
log
2
3
5
+
2
5
log
2
2
5
= 0
.
97
H
(
Output
|
overcast
) =
H
(0
,
1) =
-
(0 log
2
0 + 1 log
2
1) = 0
H
(
Output
|
rain
) =
H
2
5
,
3
5
=
-
2
5
log
2
2
5
+
3
5
log
2
3
5
= 0
.
97
H
(
Output
|
Outlook
) =
5
14
H
(
Output
|
sunny
) +
4
14
H
(
Output
|
overcast
) +
5
14
H
(
Output
|
rain
)
H
(
Output
|
Outlook
) =
5
14
0
.
97 +
4
14
0 +
5
14
0
.
97 = 0
.
69
gain
(
Outlook
) =
H
(
Output
)
-
H
(
Output
|
Outlook
) = 0
.
94
-
0
.
69 = 0
.
25
H
(
Output
|
Windy = true
) =
H
1
2
,
1
2
= 1
H
(
Output
|
Windy = false
) =
H
1
4
,
3
4
= 0
.
81
H
(
Output
|
Windy
) =
6
14
1 +
8
14
0
.
81 = 0
.
89
gain
(
Windy
) =
H
(
Output
)
-
H
(
Output
|
Windy
) = 0
.
94
-
0
.
89 = 0
.
05
⇒
‘Outlook’ is a better feature because it has a bigger information gain.
2
Question 2
It’s time to leave the calculations to your computer: Write a Python program
that uses
scikit-learn
’s
Decision Tree Classifier:
1
import
numpy as np
from
sklearn
import
tree
from
sklearn
import
preprocessing
Here is the training data from the first question:
dataset = np.array([
[
'sunny'
, 85, 85, 0,
'Don\'t Play'
],
[
'sunny'
, 80, 90, 1,
'Don\'t Play'
],
[
'overcast'
, 83, 78, 0,
'Play'
],
[
'rain'
, 70, 96, 0,
'Play'
],
[
'rain'
, 68, 80, 0,
'Play'
],
[
'rain'
, 65, 70, 1,
'Don\'t Play'
],
[
'overcast'
, 64, 65, 1,
'Play'
],
[
'sunny'
, 72, 95, 0,
'Don\'t Play'
],
[
'sunny'
, 69, 70, 0,
'Play'
],
[
'rain'
, 75, 80, 0,
'Play'
],
[
'sunny'
, 75, 70, 1,
'Play'
],
[
'overcast'
, 72, 90,1,
'Play'
],
[
'overcast'
, 81, 75, 0,
'Play'
],
[
'rain'
, 71, 80, 1,
'Don\'t Play'
],
])
Note that we changed
True
and
False
into
1
and
0
.
For our feature vectors, we need the first four columns:
X = dataset[:, 0:4]
and for the training labels, we use the last column from the dataset:
y = dataset[:, 4]
However, you will not be able to use the data as-is:
All features must be
numerical for training the classifier, so you have to transform the strings into
numbers. Fortunately,
scikit-learn
has a preprocessing class for
label encoding
that we can use:
2
le = preprocessing.LabelEncoder()
X[:, 0] = le.fit
_
transform(X[:, 0])
(Note: you will need to transform
y
as well.)
1
See
https://scikit-learn.org/stable/modules/tree.html#classification
2
See
https://scikit-learn.org/stable/modules/preprocessing
_
targets.html#preprocessing-targets
3
Now you can create a classifier object:
dtc = tree.DecisionTreeClassifier(criterion=
"entropy"
)
Note that we are using the
entropy
option for building the tree, which is the
method we’ve studied in class and used on the exercise sheet. Train the classifier
to build the tree:
dtc.fit(X, y)
Now you can predict a new value using
dtc.predict(test)
, just like you did for
the Naïve Bayes classifier last week. Note: if you want the string output that
you transformed above, you can use the
inverse
_
transform(predict)
function
of a label encoder to change the
predict
ed result back into a string.
Visualizing the decision tree can help you understand how the model makes
predictions based on the features. It provides insights into the decision-making
process of the classifier. A simple way to print the tree is:
tree.plot
_
tree(dtc)
but this can be a bit hard to read; to get a prettier version you can use the
Graphviz
3
visualization software. Graphviz is a powerful open source software
tool for creating visual representations of graphs and networks.
Here, we’ll
be using a Python package called
graphviz
to interface with Graphviz and
generate visually informative diagrams, like decision trees, directly from our
Python code. You can call it from Python like this:
4
# print a nicer tree using graphviz
import
graphviz
dot
_
data = tree.export
_
graphviz(dtc, out
_
file=None,
feature
_
names=[
'Outlook'
,
'Temperature'
,
'Humidity'
,
'Windy'
],
class
_
names=[
'Don\'t Play'
,
'Play'
],
filled=True, rounded=True)
graph = graphviz.Source(dot
_
data)
graph.render(
"mytree"
)
The result will be stored in a file
mytree.pdf
and should look like Figure
1
.
Most of the code is provided above. For transforming the labels, you can use
the same label encoder:
y = le.fit
_
transform(dataset[:, 4])
To predict a label for a new feature vector:
y
_
pred = dtc.predict([[2, 81, 95, 1]])
print
(
"Predicted output: "
, le.inverse
_
transform(y
_
pred))
3
See
https://www.graphviz.org/
4
If it is not yet installed, you can use
conda install graphviz python-graphviz
to install it (note that
you need both the Graphviz tool and its Python bindings)
4
Outlook <= 0.5
entropy = 0.94
samples = 14
value = [5, 9]
class = Play
entropy = 0.0
samples = 4
value = [0, 4]
class = Play
True
Temperature <= 77.5
entropy = 1.0
samples = 10
value = [5, 5]
class = Don't Play
False
Temperature <= 73.5
entropy = 0.954
samples = 8
value = [3, 5]
class = Play
entropy = 0.0
samples = 2
value = [2, 0]
class = Don't Play
Temperature <= 70.5
entropy = 1.0
samples = 6
value = [3, 3]
class = Don't Play
entropy = 0.0
samples = 2
value = [0, 2]
class = Play
Temperature <= 66.5
entropy = 0.811
samples = 4
value = [1, 3]
class = Play
entropy = 0.0
samples = 2
value = [2, 0]
class = Don't Play
entropy = 0.0
samples = 1
value = [1, 0]
class = Don't Play
entropy = 0.0
samples = 3
value = [0, 3]
class = Play
Figure 1: Generated Decision Tree using
scikit-learn
: Note that the string values for
Outlook
have been changed into numerical values (‘overcast’
= 0
, ‘rain’
= 1
, ‘sunny’
= 2
)
5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Computer science question
arrow_forward
Computer Science
Using computers and other automated tools to grade true-false and multiple-choice tests is widespread, but grading essays has been reserved for instructors. Until now. Software developed by a nonprofit enterprise founded by Harvard and MIT recently released software that can grade student essays and short written answers. Students answer online instead of in a blue book and immediately receive their grade after clicking a Send button. The software uses artificial intelligence (it initially “learns” how to grade an instructor’s test by reviewing 100 essay exams that the instructor scored) and is designed to free up professors for other tasks, as well as give students the opportunity to immediately revise their graded essays for an improved grade. But does the system work? Are the automated grades similar to what a human grader would award? Skeptics (including the National Council of Teachers of English) say no. Objections include a lack of tests to support the validity…
arrow_forward
Hİ, can you help me PLEASE
THANK YOU
MY tutor say fix your mistakes, ER Table
would you arrange the table way the tutor wanted?
my mistakes ;
An example is not showing primary or foreign key, one to many notations.
Produce an ERD from a given scenario using correct:
Notations
arrow_forward
I need answer quickly
arrow_forward
Create a Decision Tree from the following Decision table.
Actions/Conditions
Rules
Requester is authorized
Y
Y
Y
Y
Chemical is available
Y
Y
Y
Conditions
Chemical is hazardous
Y
Y
Requester is trained
Y
Accept request
Actions
Reject request
OFocus
近
arrow_forward
in decision table if it has 3 questions it must
contain ... answers
.... .... ..
6.
3
9.
8 O
arrow_forward
Suppose you are given the following description of a discount policy:
An insurance company has a policy of discounting its product to customers under certain circumstances:
Discounts of 5% are given only if the customer has excellent credit
If the customer has filed less than one claim in the last five years add a discount of 10%
If the customer insured multiple products (home, car, boat) etc subtract 4%
Represent this situation as a decision table
Represent the situation above as a decision tree.
arrow_forward
Goal driven reasoning deduces a problem solution from initial data1.true2.False
Note: Select one answer between those 2
arrow_forward
Which is the best way to go for Game playing problem?(a) Linear approach (b) Heuristic approach
(c) Random approach (d) Stratified approach.
arrow_forward
no Ai please
arrow_forward
Which of the following statements is true regarding choosing the machine learning approach? *
a) If we have a target variable, we use unsupervised learning
b) If we have a target variable, we use supervised learning
c) If we have a target variable, we use either supervised learning or unsupervised learning
d) A decision cannot be made between supervised or unsupervised learning on the basis of the target variable.
arrow_forward
The error bars are shown on a bar graph to represent:
Group of answer choices
a.The level of confidence
b. The standard deviation
c. The mean
d. The range
arrow_forward
Suppose you are an astrologer and you are predicting the zodiac sign of the customer using month and date of the
birth given by the user. Conditions to predict the zodiac sign are given in the following table:
Zodiac Sign
From - To
December 22 - January 19
January 20 - February 17
February 18 - March 19
March 20 - April 19
Capricorn
Aquarius
Pisces
Aries
April 20 - May 20
May 21 - June 20
June 21- July 22
July 23 - August 22
August 23 - September 22
September 23 - October 22
October 23 - November 21
Taurus
Gemini
Cancer
Leo
Virgo
Libra
Scorpio
Sagittarius
November 22 - December 21
Write a C program for the above scenario using if statement.
arrow_forward
Exercises
Question 1: Let P(x) be the statement "x=x2" If the domain consists of the integers, what are
the truth values?
a) P(0)
b) P(-1)
c)P(I)
d) 3x P(x)
e) P(2)
f) V x P(x)
Question 2: Let Q(x) be the statement "x +1>2x" If the domain consists of all integers, what
are these truth values?
a) Q(0)
b) Q(-1)
c) Q(1)
d) 3 x Q(x)
e) Vx Q(x)
f) 3x-Q(x)
(x)O - X A (3
Question 3: Let P (x) be the statement "x spends more than five hours every weekday in class,"
where the domain for x consists of all students. Express each of these quantifications in English.
a) 3xP(x)
b) Vx P(x)
c) 3x-P(x) d) Vx-P(x)
arrow_forward
Write prolog
RULE 1 if age is old and gender is male and smoker is no then risk is low; RULE 2 if age is middle and gender is male and smoker is yes then risk is high; RULE 3 if age is young and gender is female and smoker is no then risk is low;
arrow_forward
Question Completion Status:
A Click Submit to complete this assessment.
Quèstion 16
In modular exponentiation algorithm if a =1 then x:=
(x.power) mod m
(x.power) mod n
(n.power) mod m
(x.n.power) mod m
arrow_forward
Correct answer will be upvoted else Multiple Downvoted. Don't submit random answer. Computer science.
challenge comprises of n issues, where the tag of the I-th issue is meant by an integer simulated intelligence.
You need to AK (take care of all issues). To do that, you should take care of the issues in some request. To make the challenge more amusing, you made additional impediments on yourself. You would rather not take care of two issues continuously with a similar tag since it is exhausting. Additionally, you fear huge leaps in troubles while tackling them, so you need to limit the number of times that you take care of two issues continuously that are not neighboring in the challenge request.
Officially, your settle request can be depicted by a change p of length n. The expense of a change is characterized as the number of lists I (1≤i<n) where |pi+1−pi|>1. You have the prerequisite that api≠api+1 for all 1≤i<n.
You need to know the base conceivable expense of…
arrow_forward
Assignment
As an initiation into the study of ethics, carefully read each of the following scenarios. After
reflection, come up with your own answer to each of the questions.
Scenario 1
Alexis, a gifted high school student, wants to become a doctor. Because she comes from a poor
family, she will need a scholarship in order to attend college. Some of her classes require
students to do extra research projects in order to get an A. Her high school has a few older PCs,
but there are always long lines of students waiting to use them during the school day. After
school, she usually works at a part-time job to help support her family.
One evening Alexis visits the library of a private college a few miles from her family's
apartment, and she finds plenty of unused PCs connected to the Internet. She surreptitiously
looks over the shoulder of another student to learn a valid login/password combination. Alexis
returns to the library several times a week, and by using its PCs and printers she…
arrow_forward
18
The strategy that is used to allocate the smallest hole that is big enough is called:
None of the answers
Worst fit
First fit
Best fit
arrow_forward
Computer Science
Investing in stocks is a way to create assets that are supposed to provide financial security over time. In solving this problem, we assume that an investor buys several shares of stock at a certain price. These shares are going to be sold later on for a different price. Obviously, if the selling price is higher than the acquisition price, the investor makes a profit, registering capital gain. If the shares are sold at a lower price, the investor has a loss, which marks a negative capital gain.
This whole process is done over a period of time, and you are required to create a scenario for buying and selling shares. The assumption is that the investor sells shares in the order in which they were purchased.
The goal is to calculate the capital gain over time.
Suppose that you buy n shares of stock or mutual fund for d dollars each. Later, you sell some of these shares. If the sale price exceeds the purchase price, you have made a profit—a capital gain. On the other…
arrow_forward
Empirical probability
arrow_forward
Code requirements:
A robot is positioned on an integral point in a two-dimensional coordinate grid (xr, yr). There is a treasure that has been placed at a point in the same grid at (xt, yt). All x’s and y’s will be integral values. The robot can move up (North), down (South), left (West), or right (East). Commands can be given to the robot to move one position in one of the four direction. That is, “E” moves a robot one slot East (to the right) so if the robot was on position (3, 4), it would now be on (4, 4). The command N would move the robot one position north so a robot at position (4, 4) would be at (4, 5). Because the robot cannot move diagonally, the shortest distance between a robot at (xr, yr) and a treasure at (xt, yt) is | xr – xt | + | yr - yt | = ShortestPossibleDistance Write a recursive program which determines all the unique shortest possible paths from the robot to the treasure with the following stipulation: The robot may never move in the same direction more than…
arrow_forward
If a child has red hair, they inherited a certain From their parents
arrow_forward
Discrete Mathematics:
Use truth tables to establish the truth of the statement.
"A conditional statement is not logically equivalent to its inverse."
arrow_forward
Hypothesis Testing: Means when is NOT known
Problem#5
15. Credit Scores A Fair Isaac Corporation (FICO) score is
used by credit agencies (such as mortgage companies and
banks) to assess the creditworthiness of individuals. Its value
ranges from 300 to 850. An individual with a FICO score over
700 is considered to be a quality credit risk. According to Fair
Isaac Corporation, the mean FICO score is 703.5. A credit
analyst wondered whether high-income individuals (incomes
in excess of $100,000 per year) had higher credit scores. He
obtained a random sample of 40 high-income individuals and
found the sample mean credit score to be 714.2 with a stan-
dard deviation of 83.2. Conduct the appropriate test to deter-
mine if high-income individuals have higher FICO scores at
the a = 0.05 level of significance.
Answers: Fail to Reject Ho: t(critical) = 1.685, t = 0.813
arrow_forward
Problem1: Briefly describe the differences between supervised learning, unsupervised learning, and reinforcement learning.
arrow_forward
Discrete Math
arrow_forward
Submit solutions to the following problems in a single pdf document preferably completed using LaTeX.
It is likely that you will need to use programming (Python) or a matrix calculator to solve most of the
problems. You do not need to submit the code, but you should include explanations of all your
conclusions.
1. Suppose a robot mouse is placed in a grid. Each room in the grid has doors that open to all other rooms.
One room contains a trap so if the robot mouse enters that room, then the robot loses and the game ends.
If the robot mouse makes it to the outside of the grid, then it wins and the game ends. Suppose the grid is
5x5 and the trap is in the third row, third column. When the robot mouse is placed in a room, it
randomly selects a move left, right, up, or down until it is either trapped or reaches the outside. Assume
that the starting point is never the room with the trap. What is the probability that the robot mouse will
win the game? Does it depend on where the mouse begins?…
arrow_forward
Artificial intelligence.
Please answer the questions with detailed explanation. Question in image.
Thank you.
arrow_forward
PROBABILITY AND STATISTICSHow can you apply Probability & Statistics in your day to day basis? Give 5 examples.
arrow_forward
Alert: Don't submit AI generated answer and propvide detail solution with proper explanation and step by step answer.
prove by cases that : | x+ y| <= |x| + |y|
arrow_forward
Clear Explanation on this question add some step to leading on how you answerd it
neat handwriting and clear explanation
table completed
Question are below here:
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Related Questions
- Computer science questionarrow_forwardComputer Science Using computers and other automated tools to grade true-false and multiple-choice tests is widespread, but grading essays has been reserved for instructors. Until now. Software developed by a nonprofit enterprise founded by Harvard and MIT recently released software that can grade student essays and short written answers. Students answer online instead of in a blue book and immediately receive their grade after clicking a Send button. The software uses artificial intelligence (it initially “learns” how to grade an instructor’s test by reviewing 100 essay exams that the instructor scored) and is designed to free up professors for other tasks, as well as give students the opportunity to immediately revise their graded essays for an improved grade. But does the system work? Are the automated grades similar to what a human grader would award? Skeptics (including the National Council of Teachers of English) say no. Objections include a lack of tests to support the validity…arrow_forwardHİ, can you help me PLEASE THANK YOU MY tutor say fix your mistakes, ER Table would you arrange the table way the tutor wanted? my mistakes ; An example is not showing primary or foreign key, one to many notations. Produce an ERD from a given scenario using correct: Notationsarrow_forward
- I need answer quicklyarrow_forwardCreate a Decision Tree from the following Decision table. Actions/Conditions Rules Requester is authorized Y Y Y Y Chemical is available Y Y Y Conditions Chemical is hazardous Y Y Requester is trained Y Accept request Actions Reject request OFocus 近arrow_forwardin decision table if it has 3 questions it must contain ... answers .... .... .. 6. 3 9. 8 Oarrow_forward
- Suppose you are given the following description of a discount policy: An insurance company has a policy of discounting its product to customers under certain circumstances: Discounts of 5% are given only if the customer has excellent credit If the customer has filed less than one claim in the last five years add a discount of 10% If the customer insured multiple products (home, car, boat) etc subtract 4% Represent this situation as a decision table Represent the situation above as a decision tree.arrow_forwardGoal driven reasoning deduces a problem solution from initial data1.true2.False Note: Select one answer between those 2arrow_forwardWhich is the best way to go for Game playing problem?(a) Linear approach (b) Heuristic approach (c) Random approach (d) Stratified approach.arrow_forward
- no Ai pleasearrow_forwardWhich of the following statements is true regarding choosing the machine learning approach? * a) If we have a target variable, we use unsupervised learning b) If we have a target variable, we use supervised learning c) If we have a target variable, we use either supervised learning or unsupervised learning d) A decision cannot be made between supervised or unsupervised learning on the basis of the target variable.arrow_forwardThe error bars are shown on a bar graph to represent: Group of answer choices a.The level of confidence b. The standard deviation c. The mean d. The rangearrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Operations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks ColePrinciples of Information Systems (MindTap Course...Computer ScienceISBN:9781285867168Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781305082168Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning