What is a decision tree?Supervised learning Terminologies of decision tree Symbols in decision tree Types of decision tree Metrics Decision tree algorithm Example Context and Applications Practice Problems

What is a decision tree?

A decision tree is a unique kind of probability tree. A popular and powerful tool used for prediction and classification. The decision tree structure is the same as a flowchart with a tree structure. The internal node refers to the attribute test, the branch refers to the outcome of the test, and the leaf node(terminal) contains the class label.

A decision tree is an illustration of the decision-making process. In artificial intelligence (AI), the decision trees are used to decide the conclusion based on data available from past decisions. However, the conclusions are reserved values that are deployed to forecast the action(taken in near feature).

Supervised learning

A decision tree is an algorithmic and statistical model of machine learning, which learns and interprets the response from several problems and their consequences. However, the decision tree knows the decision-making rules in certain contexts depending upon the data availability. In a decision tree, the learning process is continuous and feedbacks are used to improve the learning outcome. Thus, this type of learning is known as supervised learning and the models in the decision tree support supervised learning.

Terminologies of decision tree

The important terminologies used in the decision tree are,

Root node represents the entire sample or population and this node is further divided into two or multiple homogeneous sets.
Decision node is represented in a square shape. Decision node splits the sub-node into additional sub-nodes.
Chance node is represented in circle shape and shows the probablities of vertain results
End node is represented in a triangle shape and shows the final output of the decision path.
Splitting is the process that divides the node into more than one sub-node.
Pruning is the process of removing a sub-node from the decision tree.
Branch tree: The entire tree’s subsection is called a sub-tree or branch tree.

The image represents decision tree classification.

Symbols in decision tree

The image represent the symbols used in decision tree.

Types of decision tree

The decision tree is broadly classified into two kinds,

Classification tree

The classification tree analysis is used, when the outcome of the predicate is a class (where it has the data). In other words, a dataset is classified into available datasets or classes. Example: Examining a Facebook comment and classifying the text as either positive or negative.

Regression tree

The regression tree analysis is used, when the outcome of the predicate is a real number. In other words, the prediction depends on either single or multiple predictors. Example: Length of patient’s stay at the hospital.

Classification and regression tree(CART) analysis is a kind of umbrella term, which refers to the above procedures. These procedures are introduced in the year 1984. Both the trees have certain similarities, but the major difference is the procedure to discover the split.

Metrics

The algorithm to establish a decision tree will work in a top-down manner. It determines the variable at each step to splits the item set. To calculate the best, different metrics are used by different algorithms and some of the metrics are,

Gini impurity

Measures the count of randomly selected items from a set, which could be labeled incorrectly (randomly named depending on the distribution label at the subset). CART algorithm uses Gini impurity for a classification tree. The Gini impurity of the data set is defined as,

$G i n i = 1 - \sum_{i = 1}^{n} {(p_{i})}^{2}$

Entropy

The measure of randomness in the data being processed. Higher entropy, harder to attain the conclusion. Mathematically entropy for a single attribute is,

$E (S) = {\sum - p_{i} \log_{2} p_{i}}_{i = 1}^{c} . W h e r e S - c u r r e n t s t a t e, P i - p r o b a b i l i t y o f t h e e v e n t (i) o f s t a t e (S) o r p e r c e n t a g e o f t h e c l a s s (i) i n s t a t e n o d e (S) .$

Mathematically entropy for a single attribute is,

$E (T, X) = {\sum P (c) E (c)}_{c \in X}^{c} W h e r e T - c u r r e n t s t a t e a n d X - s e l e c t e d a t t r i b u t e$

Information gain

The tree algorithm like ID3, C5.0, and C4.5 use information gain(IG). It is based on the concept of information content and entropy from the information theory. This is used to find the feature to splits at every step at tree construction. The mathematical representation of IG is,

$I n f o r m a t i o n G a i n = E n t r o p y (b e f o r e) - \sum_{j = 1}^{k} E n t r o p y (j, a f t e r) W h e r e b e f o r e - d a t a s e t b e f o r e t h e s p l i t, k - s u b s e t s g e n e r a t e d b y u \sin g s p i l t, a n d (j, a f t e r) - s u b s e t j a f t e r t h e s p l i t .$

Variance reduction

The reduction variance is used, when the decision tree works under regression; however, the output will be continuous. The algorithm uses a variance formula to split the population,

$V a r i a n c e = {\frac{\sum_{}^{} (X - \bar{X})}{n}}^{2} W h e r e X - m e a n o f t h e v a l u e,$

X- actual value,

n- number of value.

Decision tree algorithm

The decision tree algorithm comes under the supervised learning algorithm. This algorithm solves regression problems and classification problems. The decision tree aims to develop a training model, which predicts the target variable value and class with decision rules from the training data (prior data). For the record, a class label is predicted from the tree root and compares the root attribute value with its record attribute. This could be a basic comparison as it follows the branch values and jumps to its next node. The following are the algorithms used in creating a decision tree,

Iterative Dichotomiser 3(ID3).
C4.5( ID3 Successor).
CART.
Multivariate Adaptive Regression Splines(MARS).
Chi-square Automatic Interaction Detector(CHAID).

Example

The following example explains the options for mobile phone production. Each of the units has high and low-profit margins. In the end, it contains terminator nodes with their results. On the basis of that, Technology A has been chosen while Technology B has been rejected.

Context and Applications

This topic is important for postgraduate and undergraduate courses, particularly for,

Bachelors in computer science engineering.
Associate of science in computer science.

Practice Problems

Question 1: ____ is used to predict and classify data.

a) Flowchart

b) Decision tree

c) B+ tree

d) Regression tree

Answer: Option is b correct.

Explanation: A decision tree is a unique kind of probability tree, which is both a popular and powerful tool used for prediction and classification. The internal node refers to the attribute test, the branch refers to the test outcome, and the leaf node contains the class label.

Question 2: How many types of decision trees are there?

a) 5

b) 3

c) 2

d) 4

Answer: Option c is correct.

Explanation: A decision tree is a unique kind of probability tree, which is categorized into two kinds namely, regression and classification tree. Both trees have certain similarities, but the major difference is the procedure to discover the split.

Question 3: The subsection of a whole tree is called ___.

a) Branch tree

b) Internal node

c) Training data

d) Regression tree

Answer: Option a is correct.

Explanation: Subtree is the subsection of a whole tree; branch tree is the other name of the subtree. This is one of the important terminology used in the decision trees, and the others are root node, decision node, splitting, end node, change node, and pruning.

Question 4: Select the metric used in the construction of the decision tree.

a) AdaBoost

b) Gini impurity

c) Linear regression

d) None of the above

Answer: Option b is correct.

Explanation: Gini impurity is a metric of the decision tree. The decision tree supports several metrics like Information gain, chi-square, variance reduction, gain ratio, entropy, and Gini impurity. To measure the best, various metrics are used by various algorithms.

Question 5: The end node is represented in _____ shape.

a) Triangle

b) Circle

c) Rectangle

d) None of the above

Answer: Option a is correct.

Explanation: The decision tree uses different shapes to represent different nodes. The end node is represented in a triangle shape and shows the final output of the decision path.

Want more help with your computer science homework?

We've got you covered with step-by-step solutions to millions of textbook problems, subject matter experts on standby 24/7 when you're stumped, and more.

Check out a sample computer science Q&A solution here!

*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.

Tagged in

Engineering Computer Science

Artificial Intelligence

Decision Making

Decision Tree

Decision Tree Homework Questions from Fellow Students

Browse our recently answered Decision Tree homework questions.

Q: 2. Consider the following pseudocode for partition: function partition (A,L,R) pivotkey = A [R] t =…

Q: why are SMishing attacks particularily effective?

Q: Design a dynamic programming algorithm for the problem described below Input: A list of numbers A =…

Q: Activity No. Activity Time (weeks) Immediate Predecessors 1 Requirements collection 3 2 Requirements…

Q: I would like to know if my answer statment is correct? My answer: The main difference is how routes…

Q: 1.[30 pts] Answer the following questions: a. [10 pts] Write a Boolean equation in sum-of-products…

Q: You are troubleshooting a network issue where an employee's computer cannot connect to the corporate…

Q: I would like to get ab example of a situation where Agile Methods might be preferable versus the…

Q: 3) Find CFGs that for these regular languages over the alphabet Σ= {a, b}. Draw a Finite Automata e…

Q: General accounting

Q: How do you insert multiple rows at the same time? Question 10Select one: a. Select the number of…

Q: Programming Problems 9.28 Assume that a system has a 32-bit virtual address with a 4-KB page size.…

Q: Find the error, assume data is a string and all variables have been declared. for ch in data:…

Q: A(n) ____ is a box formed by the intersection of a column and a row. Question 8Select one: a.…

Q: 8. Name and Email AddressesWrite a program that keeps names and email addresses in a dictionary as…

Q: I need help to solve the following case, thank you

Q: Unit 1 Assignment 1 – Loops and Methods (25 points) Task: You are working for Kean University and…

Q: Write the following in C# WinForms. Create a poacher class that has random x and y values when…

Q: I need help to solve a simple problem using Grover’s algorithm, where the solution is not…

Q: Suppose you buy an electronic device that you operate continuously. The device costs you $300 and…

Q: need help with thi Next, you are going to combine everything you've learned about HTML and CSS to…

Q: using r language

Q: In the past, encryption and decryption were mostly done by substitution and permutation of letters…

Q: Use the same semaphore notation shown above to describe how we can ensure the execution order of the…

Q: In 32-bit MASM, Assume your grocery store sells three types of fruits. Apples, Oranges, and Mangos.…

Q: You are called by your supervisor to go and check a potential data bridge problem. What are the…

Q: Describe the different types of network cabling and connectors

Q: The data law82 in bootstrap library contains LSAT and GPA for 82 law schools. 1. Write you own R…

Q: S A B D FL I C J E G H T K L Figure 1: Search tree 1. Uninformed search algorithms (6 points) Based…

Q: Objective you will: 1. Implement a Binary Search Tree (BST) from scratch, including the Big Five…

Q: After the FCC licensing freeze was lifted, sitcoms featuring urban settings and working class…

Q: What happens to MAC addresses as frames travel from one node (device) to another? What happens to…

Q: Explain wireless networking standards

Q: 1. Create a Vehicle.java file. Implement the public Vehicle and Car classes in Vehicle.java,…

Q: Create a Database in JAVA OOP that saves name of the player and how many labyrinths did the player…

Q: Find the Error: date_string = input('Enter a date in the format mm/dd/yyyy: ') date_list =…

Q: What are the three main task pattern types? can you provide an example of each?

Q: Character Hex value || Character Hex value | Character Hex value 'A' 0x41 יני Ox4a 'S' 0x53 0x42 'K'…

Q: In the previous homework scenario problem below: You have been hired by TechCo to create and manage…

Q: Why does my pseudocode not perform what I asked? Don't know whats wrong with it.// This program asks…

Q: HELP CHAT GPT GAVE ME WRONG ANSWER Consider the following implementation of a container that will…

Q: Write a program that reads a list of integers from input and determines if the list is a palindrome…

Q: CSE330: Discrete Mathematics 1. In the classes, we discussed three forms of floating number…

Q: Please answer two JAVA OOP questions.

Q: I need help: Challenge: Assume that the assigned network addresses are correct. Can you deduce…

Q: Why is database normalization important? Give one reference with your answer.

Q: 4G+ Vo) % 1.1. LTE1 : Q B NIS شوز طبي ۱:۱۷ کا A X حاز هذا على إعجاب Mohamed Bashar. MEDICAL SHOE شوز…

Q: Define the capabilities and management of managed switches

Q: After our initial deployment for our ML home based security system, the first steps we took to…

Q: I would like to know about the following concepts: 1. defragmentation 2. disk management 3. hardware…

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.