asst3
.pdf
keyboard_arrow_up
School
University of Waterloo *
*We aren’t endorsed by this school
Course
486
Subject
Computer Science
Date
Apr 3, 2024
Type
Pages
5
Uploaded by BarristerUniverse13215
Assignment 3: Learning Na¨
ıve Bayes and Neural Networks
CS486/686 – Winter 2024
Out: March 7, 2024
Due: March 22, 2024 at 11:59pm Waterloo Time
Submit your assignment via LEARN (CS486 site) in the Assignment 3 Dropbox folder.
No late assignments will be accepted
PART A [45pts]: NA
¨
IVE BAYES LEARNING
In assignment 2, you learned a decision tree to classify text documents in two sets given a labeled training set. Here
you will learn a Na¨
ıve Bayes classifier for the same data. The data is made from a subset of Reddit posts sourced from
https://files.pushshift.io/reddit/
and processed it using Google BigQuery. The dataset includes the
first 1500 comments of August 2019 of each of the r/books and r/atheism subreddits, cleaned by removing punctuation
and some offensive language, and limiting the words to only those used more than 3 times among all posts. These
3000 comments are split evenly into training and testing sets (with 1500 documents in each).
To simplify your implementation, these posts have been pre-processed and converted to the
bag of words
model.
More precisely, each post is converted to a vector of binary values such that each entry indicates whether the document
contains a specific word or not. Each line of the files trainData.txt and testData.txt are formatted ”docId wordId”
which indicates that word wordId is present in document docId. The files trainLabel.txt and testLabel.txt indicate
the label/category (1=
atheism
or 2=
books
) for each document (docId = line#). The file words.txt indicates which
word corresponds to each wordId (denoted by the line#). If you are using Matlab, the file loadScript.m provides a
simple script to load the files into appropriate matrices. At the Matlab prompt, just type ”loadScript” to execute the
script. Feel free to use any other language and to build your own loading script for the data if you prefer.
Implement code to learn a na¨
ıve Bayes model by maximum likelihood
1
. More precisely, learn a Bayesian network
where the root node is the label/category variable with one child variable per word feature. The word variables should
be binary and represent whether that word is present or absent in the document. Learn the parameters of the model by
maximizing the likelihood of the training set only. This will set the class probability to the fraction of documents in the
training set from each category, and the probability of a word given a document category as the fraction of documents
in that category that contain that word. You should use a
Laplace correction
by adding
1
to numerator and
2
to the
denominator, in order to avoid situations where both classes have probability of
0
. Classify documents by computing
the label/category with the highest posterior probability
Pr(
label
|
words in document
)
. Report the training and testing
accuracy (i.e., percentage of correctly classified articles).
1
For the precise equations for this, see the note on the course webpage
https://cs.uwaterloo.ca/
˜
jhoey/teaching/cs486/
naivebayesml.pdf
1
What to hand in:
•
[10 pts]
A printout of your code.
•
[10 pts]
A printout listing the 10 most discriminative word features measured by
max
word
|
log Pr(
word
|
label
1
)
−
log Pr(
word
|
label
2
)
|
Since the posterior of each label is formulated by multiplying by the conditional probability
Pr(
word
|
label
i
)
, a
word feature should be more discriminative when the ratio
Pr(
word
|
label
1
)
/
Pr(
word
|
label
2
)
is large or small
and therefore when the absolute difference between
log Pr(
word
|
label
1
)
and
log Pr(
word
|
label
2
)
is large. In
your opinion, are these good word features?
•
[10 pts]
Training and testing accuracy (i.e., two numbers indicating the percentage of correctly classified articles
for the training and testing set).
•
[5 pts]
The na¨
ıve Bayes model assumes that all word features are independent. Is this a reasonable assumption?
Explain briefly.
•
[5 pts]
What could you do to extend the Na¨
ıve Bayes model to take into account dependencies between words?
•
[5 pts]
What if, instead of using ML learning, you were to use MAP learning. Explain what you would need to
add and how it would work.
2
PART B [80pts]: Neural Networks for Classification and Regression
In this part of the assignment, you will implement a feedforward neural network from scratch. Additionally, you
will implement activation functions, a loss function, and a performance metric. Lastly, you will train a neural network
model to perform a regression problem.
Red Wine Quality - A Regression Problem
The task is to predict the quality of red wine from northern Portugal, given some physical characteristics of the
wine. The target
y
∈
[0
,
10]
is a continuous variable, where
10
is the best possible wine, according to human tasters.
This dataset was downloaded from the UCI Machine Learning Repository. The features are all real-valued. They are
listed below:
• Fixed acidity
• Volatile acidity
• Citric acid
• Residual sugar
• Chlorides
• Free sulfur dioxide
• Total sulfur dioxide
• Density
• pH
• Sulphates
• Alcohol
Training a Neural Network
In Lecture 9b, you learned how to train a neural network using the backpropagation algorithm. In this assignment,
you will apply the forward and backward pass to the entire dataset simultaneously (i.e.
batch gradient descent).
As a result, your forward and backward passes will manipulate tensors, where the first dimension is the number of
examples in the training set,
n
. When updating an individual weight
W
(
l
)
i,j
, you will need to find the average gradient
∂
L
∂W
(
l
)
i,j
(where
L
is the Error) across all examples in the training set to apply the update.
Algorithm 1 gives the
training algorithm in terms of functions that you will implement in this assignment. Further details can be found in
the documentation for each function in the provided source code.
Algorithm 1
Training
Require:
η >
0
▷
Learning rate
Require:
n
epochs
∈
N
+
▷
Number of epochs
Require:
X
∈
R
n
×
f
▷
Training examples with
n
examples and
f
features
Require:
y
∈
R
n
▷
Targets for training examples
Initiate weight matrices
W
(
l
)
randomly for each layer.
▷
Initialize
net
for
i
∈ {
1
,
2
, . . . , n
epochs
}
do
▷
Conduct
n
epochs
epochs
A
vals
,
Z
vals
←
net.forward
pass(
X
)
▷
Forward pass
ˆ
Y
←
Z
vals[-1]
▷
Predictions
L
← L
(
ˆ
Y , Y
)
Compute
∂
∂
ˆ
Y
L
(
ˆ
Y , Y
)
▷
Derivative of error with respect to predictions
deltas
←
backward
pass(A
vals,
∂
∂
ˆ
Y
L
(
ˆ
Y , Y
)
)
▷
Backward pass
update
gradients()
▷ W
(
ℓ
)
i,j
←
W
(
ℓ
)
i,j
−
η
∑
n
∂
L
∂W
(
ℓ
)
i,j
for each weight
end for
return
trained weight matrices
W
(
ℓ
)
Activation and Loss Functions
You will implement the following activation functions and their derivatives:
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
II. Project Description
Parts Catalog
This project builds a parts catalog using a B-tree for storage.
Introduction:
Today a parts catalog would be built using a database for storage. Databases are very versatile in
their ability to store and retrieve data. In this project, you will build a parts catalog system as a single-user
system without the use of a database. Instead, you will create a B+-tree and store the parts data in it.
Overview:
This project consists of two parts: a B+-tree for storing parts data and a user interface for accessing
the data. These two parts should be independent, so that changing to another storage system or another
user interface is not too difficult.
Your project should allow the user to maintain a parts catalog. A flat-file of part data will be
provided. Your program should begin by loading data from the flat-file into a B+-tree. Once loaded, the
user can query for a particular part number, display the next 10 parts, modify the description of a part, add…
arrow_forward
Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts…
arrow_forward
To showcase the Search Tree Structures paradigm, HTML-based apps may be developed and built.
These applications might be used to show how the representation works.
arrow_forward
BY using Java launguge develop Java a Student Linked List system to achive the following Details:
Student using Linked List data structure, don’t forget to add the main operations such as adding a new node, deleting a node, updating, printing, etc.
please not that:
• The final report should be prepared in a well manner layout.• Try to include some theoretical part of your selected data structure and algorithm, and then describe the used dataset and then the obtained results.
arrow_forward
You have learned the following data structures in CS-102. Mention two 'real-world examples' (each) of these data structures:
1. Nested Lists
2. Stacks
3. Queues
4. Circular Queues
5. Priority Queues
6. Graphs
Your Answer:
1. Nested Lists
• Examples:
2. Stacks
• Examples:
3. Queues
Examples:
4. Circular Queues
. Examples:
5. Priority Queues
Examples:
6. Graphs
Examples:
Edit View Insert Format Tools Table
12pt
Paragraph
arrow_forward
Data structures and algorithm.
arrow_forward
DATA STRUCTURES AND ALGORITHMS
arrow_forward
This is a data structure questions answer carefully and dont plagarize please
arrow_forward
Q24
arrow_forward
Course:Artificial Intelligence
Solution needed using C++/Python Programming Language.
arrow_forward
C programming
A queue has five nodes; the data of the nodes is given below:
[front ] 4 -- 7 -- 3 -- 9 -- 2 [back ]
After one enqueue function the queue is given below:
[front] 4 -- 7 -- 3 -- 9 -- 2 -- 1 [back]
What will the queue look like after a node is removed and then a node with the value 15 is added?
OPTIONS:
a. 7 -- 3 -- 9 -- 2 -- 15
b. 15 -- 3 -- 9 -- 2 -- 1
c. 7 -- 3 -- 9 -- 2 -- 1 -- 15
d. Not one of the options
e. 3 -- 9 -- 2 -- 1 -- 15
f. 15 -- 7 -- 3 -- 9 -- 2 -- 1
g. 15 -- 7 -- 3 -- 9 -- 2
arrow_forward
A queue has five nodes; the data of the nodes is given below:
[front / voorkant] 4 -- 7 -- 3 -- 9 -- 2 [back ]
After one enqueue function the queue is given below:
[front / voorkant] 4 -- 7 -- 3 -- 9 -- 2 -- 1 [back ]
What will the queue look like after a node is removed and then a node with the value 15 is added?
arrow_forward
Horizontal sequence :RIVL
Vertical sequence:FMK
Scoring rules: g/o = -3, g/e = -1, match or mismatch - from PAM250 substitution matrix below.
SW algorithm.
1. Complete the scoring matrix.
Scoring matrix with PAM250 scores:
R
I
V
L
F
M
K
2. Set up, initialize and complete the SW matrix.
3. Retrace, align and score alignment(s).
Use the arrows and circles for the matrix and path(s).
R
I
V
L
F
M
K
Align and score all optimal alignments here.
PLZ the arrows and circles for the matrix and path(s) AND SHOW ALL possible Alignment
arrow_forward
decision_trees.ipynb
Decision trees
Step 1: Create a decision tree classifier
Now it is time to create a decision tree classifier using entropy:
[]...
Step 2: Train the Classifier on the Training Data:
(the arguments are the same as the knn classifier)
[].....
Parsing
clf.fit(iris_features_train, iris_label_train)
clf is our decision tree classifier. fit means to train the classifier on the dataset. iris_features_train is the DataFrame containing all the feature columns of the training set and iris_label_train are the labels of the training set.
arrow_forward
You are approached by an online delivery shipping firm, which asks you to write them code for some tasks they face. The products they ship have some attributes, which are as follows:
Product ID (Integer) [This is unique for every product]
Product label (String)
Manufacturer (String)
All strings are of maximum length 100 and contain only alphanumeric characters. The products arrive one by one, and a common queue is maintained for all of them. Also, there is a fixed set of manufacturers the company has a tie-up with:
Nike
Adidas
Reebok
Puma
Diadora
You are to automate some repetitive tasks. The tasks are as follows: 1) Add a new product to the queue.
2) Deliver the next product of the queue and print the product information delivered.
3) Query how many products of a given manufacturer is currently present in the queue.
4) Query how many products of a given manufacturer has been shipped already.
Initially, the product queue is empty. It is also guaranteed that when new products are…
arrow_forward
What advantages does a binary search tree have over other data structures, such as a linked list or an array, and how might these advantages be utilized?
arrow_forward
Linked list manipulations question.
arrow_forward
What advantages does a binary search tree have over other data structures, such as a linked list or an array, and how does this advantage manifest itself?
arrow_forward
Draw the search tree based on Depth First Search (DFS) ONLY for the visited areas up to goal by the DFS if your current position is at FPT and you are travelling to CAN.
arrow_forward
Question 2
Given the following search graph, write the sequence of node numbers in the search
agenda across the search life-time and using A* search.
€4
1
2
3
4
5
6
542
569
133
497
0
161
197
*C3
€5
€6
Assume the following heuristic value per node:
Node ID Node Heuristic value
569
346
44897
240
€1
Assume distance between cities are as mentioned on the links
Apply A* algorithm showing intermediate values for the Agenda, g(n), h(n)
Source city is :C5
Destination city:C4
|||
arrow_forward
A search key attribute was introduced in order to manage non-unique search parameters. How might this affect the height of the B+ tree?
arrow_forward
Customers at a motor vehicle licence renewal office is assigned a number on arrival, and customers are served in the order of their allocated numbers.A Priority Queue is selected as the data structure to maintain the waiting list. Why?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
New Perspectives on HTML5, CSS3, and JavaScript
Computer Science
ISBN:9781305503922
Author:Patrick M. Carey
Publisher:Cengage Learning
Related Questions
- II. Project Description Parts Catalog This project builds a parts catalog using a B-tree for storage. Introduction: Today a parts catalog would be built using a database for storage. Databases are very versatile in their ability to store and retrieve data. In this project, you will build a parts catalog system as a single-user system without the use of a database. Instead, you will create a B+-tree and store the parts data in it. Overview: This project consists of two parts: a B+-tree for storing parts data and a user interface for accessing the data. These two parts should be independent, so that changing to another storage system or another user interface is not too difficult. Your project should allow the user to maintain a parts catalog. A flat-file of part data will be provided. Your program should begin by loading data from the flat-file into a B+-tree. Once loaded, the user can query for a particular part number, display the next 10 parts, modify the description of a part, add…arrow_forwardCreate a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts to re-insert such components.Create a queue-like data type that only permits one insert per element. Use an existence symbol table to keep track of all items that have ever been inserted, and deny attempts…arrow_forwardTo showcase the Search Tree Structures paradigm, HTML-based apps may be developed and built. These applications might be used to show how the representation works.arrow_forward
- BY using Java launguge develop Java a Student Linked List system to achive the following Details: Student using Linked List data structure, don’t forget to add the main operations such as adding a new node, deleting a node, updating, printing, etc. please not that: • The final report should be prepared in a well manner layout.• Try to include some theoretical part of your selected data structure and algorithm, and then describe the used dataset and then the obtained results.arrow_forwardYou have learned the following data structures in CS-102. Mention two 'real-world examples' (each) of these data structures: 1. Nested Lists 2. Stacks 3. Queues 4. Circular Queues 5. Priority Queues 6. Graphs Your Answer: 1. Nested Lists • Examples: 2. Stacks • Examples: 3. Queues Examples: 4. Circular Queues . Examples: 5. Priority Queues Examples: 6. Graphs Examples: Edit View Insert Format Tools Table 12pt Paragrapharrow_forwardData structures and algorithm.arrow_forward
- Course:Artificial Intelligence Solution needed using C++/Python Programming Language.arrow_forwardC programming A queue has five nodes; the data of the nodes is given below: [front ] 4 -- 7 -- 3 -- 9 -- 2 [back ] After one enqueue function the queue is given below: [front] 4 -- 7 -- 3 -- 9 -- 2 -- 1 [back] What will the queue look like after a node is removed and then a node with the value 15 is added? OPTIONS: a. 7 -- 3 -- 9 -- 2 -- 15 b. 15 -- 3 -- 9 -- 2 -- 1 c. 7 -- 3 -- 9 -- 2 -- 1 -- 15 d. Not one of the options e. 3 -- 9 -- 2 -- 1 -- 15 f. 15 -- 7 -- 3 -- 9 -- 2 -- 1 g. 15 -- 7 -- 3 -- 9 -- 2arrow_forwardA queue has five nodes; the data of the nodes is given below: [front / voorkant] 4 -- 7 -- 3 -- 9 -- 2 [back ] After one enqueue function the queue is given below: [front / voorkant] 4 -- 7 -- 3 -- 9 -- 2 -- 1 [back ] What will the queue look like after a node is removed and then a node with the value 15 is added?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- New Perspectives on HTML5, CSS3, and JavaScriptComputer ScienceISBN:9781305503922Author:Patrick M. CareyPublisher:Cengage Learning
New Perspectives on HTML5, CSS3, and JavaScript
Computer Science
ISBN:9781305503922
Author:Patrick M. Carey
Publisher:Cengage Learning