This is a python problem. Please follow instructions. Hints are also present if you need. Also I am attaching txt file picture.     The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns some modules for this code: import numpy as np import matplotlib.pyplot as plt %matplotlib inline 1. Load and visualize the dataset Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0]. 2.Determine the number of optimal clusters Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters. 3.Initialize the cluster centers The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

This is a python problem. Please follow instructions. Hints are also present if you need. Also I am attaching txt file picture.

 

 

The preliminary dataset contains the geographical (2D) locations of habitual pizza eaters. Based on this information identify the clusters of potential customers and their location centers by implementing the k-means clustering algorithm. For that follow these patterns

some modules for this code:

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

1. Load and visualize the dataset

Use the np.loadtxt() function to read the customters.txt file into a NumPy array (use customers as the variable name for the array). This array contains the 2D coordinates of 250 (n) potential customers. Make sure, you understand the shape and meaning of this array. Use the plt.scatter() function to create a 2D plot of the locations of the customers. Note: the plot functions expect the x and y coordinates as separate arrays. E.g. you can select the x coordinates from all customers by using customers[:, 0].

2.Determine the number of optimal clusters

Use your human intuition based on the scatter plot above to decide how many clusters of customers you want to identify. This is the only step which relies on human brainpower (in practice, this can be automated as well, but that is beyond the scope of the assignment). Set variable k to the number of desired clusters.

3.Initialize the cluster centers

The algorithm will keep track (and update) the centers of each cluster and will assign each customer to exactly one cluster based on the distances of the customer and centers. For the initial locations of the cluster centers pick k number of random of customer locations and store this array in a new variable, called centroids. Verify, that the shape os this array is (k, 2).

4.Initialize the cluster assignment

Compute a new 1-dimensional integer array of n elements, called assigment, which describes the customer to cluster assignment. The assignment is decided on the geographical distances. For each customer store the integer index of the cluster, whose current location center (see centroids) is the closest to the customer. Do not use explicit for loops.

Hint: first, compute all the pairwise distances between the customers and all cluster centers using a 3D array and automatic broadcasting. Try to understand the shape and meaning of the following expression: customers - centroids[:, np.newaxis, :]. Use this expression with squaring and the NumPy sum function (on the proper axis) to compute the pairwise distances. Finally, use argmin (on the proper axis) to find the indexes of the closest cluster centers.

5.Verify the cluster assignment

Execute the code below to verify that the customers, centroids, and assignment arrays are properly initialized. You should see the k cluster centers in red and the assignments in different colors. Note: this is (obviously) not the final/optimal clustering, yet.

This is the code to verify your code----

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster assignment");

6.Update the cluster centers----

Compute the updated location of the cluster centers. Based on the cluster assignment, compute the mean location of the customers in each cluster. This is going to be the new/updated location of the cluster center (centroids).

Hint: This time you may use a (short) loop on the number of clusters. Try to use boolean indexing (masking) with the assignment to select all the customers belonging to the current cluster. You can also use the NumPy mean() function (with the proper axis) to compute the mean location of the cluster.

7.Verify the updated cluster centers

Execute the code below to verify that centroids is properly updated. While this is still not the final clustering, each center should be at the center of its assigned customers.

This is the code to verify this

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Initial cluster centers");

8.Iterative optimization

Based on steps 5 & 7 (you can use copy & paste), implement a loop which iteratively updates the cluster assignment and the cluster centers as long as there is some change in the cluster assignment.

9.Verify the final clusters

Execute the code below to see the final clusters. Note: while the results should look reasonable, it is not guaranteed that the algorithm finds the most optimal clusters and centers (becuase of the initial random picks of the centers). Try to run the notebook multiple times (you may want to use np.random.seed() with different parameters at the start) to see if you can get the desired/optimal result.

This is the code to verify that

plt.scatter(customers[:, 0], customers[:, 1], c=assignment)

plt.scatter(centroids[:, 0], centroids[:, 1], c="red")

plt.title("Final clusters");

 

File
Edit View
-3.633271721169406110e+00 -2.749418531032250979e+00
-2.976795656232069209e+00 -3.130130619095286004e+00
-4.046241035806734665e+00 -2.413595797540917687e+00
Transcribed Image Text:File Edit View -3.633271721169406110e+00 -2.749418531032250979e+00 -2.976795656232069209e+00 -3.130130619095286004e+00 -4.046241035806734665e+00 -2.413595797540917687e+00
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Greatest Common Divisor
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education