This is a python problem. Please follow instructions. Hints are also present if you need. Also I am attaching txt file picture.

The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns

some modules for this code:

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

1. Load and visualize the dataset

Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0].

2.Determine the number of optimal clusters

Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters.

3.Initialize the cluster centers

The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).

4.Initialize the cluster assignment

Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops.

Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers.

5.Verify the cluster assignment

Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet.

This is the code to verify your code----

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster assignment");

6.Update the cluster centers----

Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids).

Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster.

7.Verify the updated cluster centers

Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers.

This is the code to verify this

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster centers");

8.Iterative optimization

Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment.

9.Verify the final clusters

Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result.

This is the code to verify that

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Final clusters");

Question

This is a python problem. Please follow instructions. Hints are also present if you need. Also I am attaching txt file picture.

The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns

some modules for this code:

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

1. Load and visualize the dataset

Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0].

2.Determine the number of optimal clusters

Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters.

3.Initialize the cluster centers

The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).

4.Initialize the cluster assignment

Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops.

Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers.

5.Verify the cluster assignment

Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet.

This is the code to verify your code----

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster assignment");

6.Update the cluster centers----

Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids).

Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster.

7.Verify the updated cluster centers

Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers.

This is the code to verify this

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster centers");

8.Iterative optimization

Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment.

9.Verify the final clusters

Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result.

This is the code to verify that

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Final clusters");

File
Edit View
-3.633271721169406110e+00 -2.749418531032250979e+00
-2.976795656232069209e+00 -3.130130619095286004e+00
-4.046241035806734665e+00 -2.413595797540917687e+00

Accepted Answer

Given: Dataset contains the geographical (2D) locations of habitual pizza eaters-…