Homework 5
.pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
MISC
Subject
Mathematics
Date
Jan 9, 2024
Type
Pages
8
Uploaded by ConstableNeutronRam141
Homework 5: Cluster Analysis and Anomaly Detection Question 1: Assume we have a simple dataset with 10 two-dimensional points (x, y). Dataset: (2, 3), (3, 3), (3, 4), (4, 4), (7, 5), (9, 4), (6, 8), (8, 8), (9, 9), (8, 10) Use KMeans algorithm and group data points into two clusters. Initial centroids are Centroid 1: (3, 3), Centroid 2: (8, 8) Solution: Points Centroid 1: (3,3) Centroid 2(8,8) cluster (2,3) 1 7.81 1 (3,3) 0 7.07 1 (3,4) 1 6.40 1 (4,4) 1.41 5.65 1 (7,5) 4.47 3.16 2 (9,4) 6.08 4.12 2 (6,8) 5.83 2 2 (8,8) 7.07 0 2 (9,9) 8.48 1.41 2 (8,8) 8.60 2 2 Next centroids New Centroid 1 = (2+3+3+4)/4, (3+3+4+4)/4 = (3,3.5) New Centroid 2= (7+9+6+8+9+8)/6, (5+4+8+8+9+8)/6 = (7.8, 7.3) Points Centroid 1: (3,3.5) Centroid 2: (7.8,7.3) New cluster (2,3) 1.11 7.2 1 (3,3) 0.5 6.44 1 (3,4) 0.5 5.82 1 (4,4) 1.11 5.03 1 (7,5) 4.27 2.43 2 (9,4) 6.02 3.51 2 (6,8) 5.4 2.15 2 (8,8) 6.7 0.72 2 (9,9) 8.1 2.08 2 (8,8) 8.2 2.70 2 Final clusters: Cluster 1: (2,3), (3,3), (3,4), (4,4) Cluster 2: (7,5), (9,4), (6,8), (8,8), (9,9), (8, 10)
Homework 5: Cluster Analysis and Anomaly Detection Question 2: A data scientist plans to use DBSCAN with the minimum number of points set to 5. Identify each labeled point in the scatter plot as a border point, core point, or outlier. (Discuss it). Solution: A is considered a core point since it meets the criteria for the minimum number of points within the specified ε
range. On the other hand, B and C are identified as outliers because they fail to meet the minimum number of points requirement and also fall outside the range of a core point to be classified as a border point. Question 3: Determine whether each labeled point in the figure below is a core point, a boundary point, or an outlier given ε
= 2 and the minimum number of points for a core point is 4. (Discuss it).
Homework 5: Cluster Analysis and Anomaly Detection Solution
: For a point to be a core point, it should satisfy the min_points condition within the given ε
. So, A is a core point which has 4 samples within ε
= 2. Point B though it has 2 points within the ε
, it doesn
’
t have 4 samples so it is a boundary point. Point C doesn’t have any points within the ε
limit. So, C is an outlier. Question 4: What is the distance between the two clusters using centroid linkage? Solution: Centroid of cluster A = (1+2+0)/3, (6+4+2)/3 = (1,4) Centroid of cluster B = (6+8)/2, (8+4)/2 = (7,4) distance between the two clusters = √
(
?
1
−
?
2
)
2
+
(?
1
−
?
2
)
2
=
√
(
1
− 7
)
2
+
(4
− 4
)
2
= 6 Question 5: Create a dendrogram from the following figure. (Height of dendrogram in your solution is not important.)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help