Assignment 16 Clustering Algorithms

.docx

School

Arizona State University *

*We aren’t endorsed by this school

Course

511

Subject

Computer Science

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by AgentQuetzal3025

Assignment 16 Clustering Algorithms Prasad Srinivas IFT 511: Analyzing Big Data Professor: Asmaa Elbadrawy Tuesday and Thursday (12:00 PM – 1:15 PM) November 10 th , 2023 1

1. Define the Clustering problem. Explain what is meant by it being an "unsupervised-learning" activity. Clustering is a method used in data analysis where we look for groupings or patterns within a dataset. In the task, our aim is to uncover structures or relationships, among data points without any predefined categories. This is considered a learning process, which means that the algorithm operates with data. Unlike learning, where algorithms learn from labeled examples clustering delves into the dataset without any knowledge of group memberships. In unsupervised learning, the learning algorithm operates without the presence of labeled data. Consequently, clustering falls into the category of unsupervised learning, as it involves identifying inherent groupings in a dataset without predefined labels. Unlike supervised learning, where class labels guide the evaluation by matching predicted and actual labels, clustering lacks such a comparison framework in unsupervised learning. 2. What is the difference between Hierarchical vs Partitional Clustering? Hierarchical Clustering Partitional Clustering No need for a predefined number of clusters. Requires specifying the number of clusters (K) beforehand. Produces a hierarchical tree of nested clusters. Divides data into non-overlapping subsets. Clusters can be overlapping. Examples include K-means, Fuzzy c-means, and QT clustering. It includes Agglomerative and divisive algorithms. It includes K-means, Fuzzy c-means, and QT clustering 3. What is the difference between Complete vs Partial Clustering? Complete Clustering Partial Clustering Allocates each object to a cluster. Does not allocate each object to a cluster. Every data point belongs to at least one cluster. Some objects in the dataset may not fit into distinct groups. Goal is for each data point to be part of every cluster Goal is to identify objects that don't relate to a well- defined group. 4. Using your own words, explain the main idea behind K-Means and how it works. K Means is an algorithm that divides data points into K clusters by considering their similarities. The primary concept involves improving the assignments of clusters and determining their centers. 1. Begin with K centers. 2. Assign each data point to the center. 3. Update the centers by calculating the mean of the points, in each cluster. 4. Repeat these steps until the centers stabilize or a predetermined number of iterations is reached. 5. Using your own words, explain how Agglomerative (Hierarchical) clustering works. Agglomerative clustering begins with pts as independent clusters and combines the nearest set of clusters pair at every step until 1 cluster (or K clusters) remains. 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version