lab05-sol

.pdf

School

Concordia University *

*We aren’t endorsed by this school

Course

6721

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by CaptainMaskWildcat29

COMP 6721 Applied Artificial Intelligence (Fall 2023) Lab Exercise #5: Decision Trees & K-means Solutions Question 1 Given the training instances below, use information theory to find whether ‘Outlook’ or ‘Windy’ is the best feature to decide when to play a game of golf. Outlook Temperature Humidity Windy Play / Don’t Play sunny 85 85 false Don’t Play sunny 80 90 true Don’t Play overcast 83 78 false Play rain 70 96 false Play rain 68 80 false Play rain 65 70 true Don’t Play overcast 64 65 true Play sunny 72 95 false Don’t Play sunny 69 70 false Play rain 75 80 false Play sunny 75 70 true Play overcast 72 90 true Play overcast 81 75 false Play rain 71 80 true Don’t Play 1

H ( Output ) = H 5 14 , 9 14 = - 5 14 log 2 5 14 + 9 14 log 2 9 14 = 0 . 94 H ( Output | sunny ) = H 3 5 , 2 5 = - 3 5 log 2 3 5 + 2 5 log 2 2 5 = 0 . 97 H ( Output | overcast ) = H (0 , 1) = - (0 log 2 0 + 1 log 2 1) = 0 H ( Output | rain ) = H 2 5 , 3 5 = - 2 5 log 2 2 5 + 3 5 log 2 3 5 = 0 . 97 H ( Output | Outlook ) = 5 14 H ( Output | sunny ) + 4 14 H ( Output | overcast ) + 5 14 H ( Output | rain ) H ( Output | Outlook ) = 5 14 0 . 97 + 4 14 0 + 5 14 0 . 97 = 0 . 69 gain ( Outlook ) = H ( Output ) - H ( Output | Outlook ) = 0 . 94 - 0 . 69 = 0 . 25 H ( Output | Windy = true ) = H 1 2 , 1 2 = 1 H ( Output | Windy = false ) = H 1 4 , 3 4 = 0 . 81 H ( Output | Windy ) = 6 14 1 + 8 14 0 . 81 = 0 . 89 gain ( Windy ) = H ( Output ) - H ( Output | Windy ) = 0 . 94 - 0 . 89 = 0 . 05 ⇒ ‘Outlook’ is a better feature because it has a bigger information gain. 2

Question 2 It’s time to leave the calculations to your computer: Write a Python program that uses scikit-learn ’s Decision Tree Classifier: 1 import numpy as np from sklearn import tree from sklearn import preprocessing Here is the training data from the first question: dataset = np.array([ [ 'sunny' , 85, 85, 0, 'Don\'t Play' ], [ 'sunny' , 80, 90, 1, 'Don\'t Play' ], [ 'overcast' , 83, 78, 0, 'Play' ], [ 'rain' , 70, 96, 0, 'Play' ], [ 'rain' , 68, 80, 0, 'Play' ], [ 'rain' , 65, 70, 1, 'Don\'t Play' ], [ 'overcast' , 64, 65, 1, 'Play' ], [ 'sunny' , 72, 95, 0, 'Don\'t Play' ], [ 'sunny' , 69, 70, 0, 'Play' ], [ 'rain' , 75, 80, 0, 'Play' ], [ 'sunny' , 75, 70, 1, 'Play' ], [ 'overcast' , 72, 90,1, 'Play' ], [ 'overcast' , 81, 75, 0, 'Play' ], [ 'rain' , 71, 80, 1, 'Don\'t Play' ], ]) Note that we changed True and False into 1 and 0 . For our feature vectors, we need the first four columns: X = dataset[:, 0:4] and for the training labels, we use the last column from the dataset: y = dataset[:, 4] However, you will not be able to use the data as-is: All features must be numerical for training the classifier, so you have to transform the strings into numbers. Fortunately, scikit-learn has a preprocessing class for label encoding that we can use: 2 le = preprocessing.LabelEncoder() X[:, 0] = le.fit _ transform(X[:, 0]) (Note: you will need to transform y as well.) 1 See https://scikit-learn.org/stable/modules/tree.html#classification 2 See https://scikit-learn.org/stable/modules/preprocessing _ targets.html#preprocessing-targets 3

Now you can create a classifier object: dtc = tree.DecisionTreeClassifier(criterion= "entropy" ) Note that we are using the entropy option for building the tree, which is the method we’ve studied in class and used on the exercise sheet. Train the classifier to build the tree: dtc.fit(X, y) Now you can predict a new value using dtc.predict(test) , just like you did for the Naïve Bayes classifier last week. Note: if you want the string output that you transformed above, you can use the inverse _ transform(predict) function of a label encoder to change the predict ed result back into a string. Visualizing the decision tree can help you understand how the model makes predictions based on the features. It provides insights into the decision-making process of the classifier. A simple way to print the tree is: tree.plot _ tree(dtc) but this can be a bit hard to read; to get a prettier version you can use the Graphviz 3 visualization software. Graphviz is a powerful open source software tool for creating visual representations of graphs and networks. Here, we’ll be using a Python package called graphviz to interface with Graphviz and generate visually informative diagrams, like decision trees, directly from our Python code. You can call it from Python like this: 4 # print a nicer tree using graphviz import graphviz dot _ data = tree.export _ graphviz(dtc, out _ file=None, feature _ names=[ 'Outlook' , 'Temperature' , 'Humidity' , 'Windy' ], class _ names=[ 'Don\'t Play' , 'Play' ], filled=True, rounded=True) graph = graphviz.Source(dot _ data) graph.render( "mytree" ) The result will be stored in a file mytree.pdf and should look like Figure 1 . Most of the code is provided above. For transforming the labels, you can use the same label encoder: y = le.fit _ transform(dataset[:, 4]) To predict a label for a new feature vector: y _ pred = dtc.predict([[2, 81, 95, 1]]) print ( "Predicted output: " , le.inverse _ transform(y _ pred)) 3 See https://www.graphviz.org/ 4 If it is not yet installed, you can use conda install graphviz python-graphviz to install it (note that you need both the Graphviz tool and its Python bindings) 4

Outlook <= 0.5 entropy = 0.94 samples = 14 value = [5, 9] class = Play entropy = 0.0 samples = 4 value = [0, 4] class = Play True Temperature <= 77.5 entropy = 1.0 samples = 10 value = [5, 5] class = Don't Play False Temperature <= 73.5 entropy = 0.954 samples = 8 value = [3, 5] class = Play entropy = 0.0 samples = 2 value = [2, 0] class = Don't Play Temperature <= 70.5 entropy = 1.0 samples = 6 value = [3, 3] class = Don't Play entropy = 0.0 samples = 2 value = [0, 2] class = Play Temperature <= 66.5 entropy = 0.811 samples = 4 value = [1, 3] class = Play entropy = 0.0 samples = 2 value = [2, 0] class = Don't Play entropy = 0.0 samples = 1 value = [1, 0] class = Don't Play entropy = 0.0 samples = 3 value = [0, 3] class = Play Figure 1: Generated Decision Tree using scikit-learn : Note that the string values for Outlook have been changed into numerical values (‘overcast’ = 0 , ‘rain’ = 1 , ‘sunny’ = 2 ) 5

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version