Implement an iterative algorithm (k-means) in Spark to calculate k-means fora set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5. Follow this pattern: Randomly assign a centroid to each of the k clusters (k =5). Calculate the distance of all observation to each of the k centroids Assign observations to the closest centroid Find the new location of the centroid by taking the mean of all the observations in each cluster Repeat steps 3-5 until the centroids do not change position Note: You need a variable to decide when the K-means calculation is done – whenthe amount the locations of the means changes between iterations is less than the variable. Setthe variable = 0.1. Example of imput file (an rdd): [(7869, 8696), (8676, -4746), (9484, 112526), (-1827, 5958), (987, 900087), (18127, 9383), (298, 272), (91716, 2827), (12625, 92827) ........]

Question

Implement an iterative algorithm (k-means) in Spark to calculate k-means for
a set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5.

Follow this pattern:

Randomly assign a centroid to each of the k clusters (k =5).
Calculate the distance of all observation to each of the k centroids
Assign observations to the closest centroid
Find the new location of the centroid by taking the mean of all the observations in each cluster
Repeat steps 3-5 until the centroids do not change position

Note: You need a variable to decide when the K-means calculation is done – when
the amount the locations of the means changes between iterations is less than the variable. Set
the variable = 0.1.

Example of imput file (an rdd):

[(7869, 8696), (8676, -4746), (9484, 112526), (-1827, 5958), (987, 900087), (18127, 9383), (298, 272), (91716, 2827), (12625, 92827) ........]