Implement an iterative algorithm (k-means) in Spark to calculate k-means fora set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5. Follow this pattern: Randomly assign a centroid to each of the k clusters (k =5). Calculate the distance of all observation to each of the k centroids Assign observations to the closest centroid Find the new location of the centroid by taking the mean of all the observations in each cluster Repeat steps 3-5 until the centroids do not change position Note: You need a variable to decide when the K-means calculation is done – whenthe amount the locations of the means changes between iterations is less than the variable. Setthe variable = 0.1. Example of imput file (an rdd): [(7869, 8696), (8676, -4746), (9484, 112526), (-1827, 5958), (987, 900087), (18127, 9383), (298, 272), (91716, 2827), (12625, 92827) ........]

icon
Related questions
Question

Implement an iterative algorithm (k-means) in Spark to calculate k-means for
a set of points that are in a file, a k-means algorithm in python. Do not use use K-means in MLib of Spark to solve the problem. Set the center points to k=5.

Follow this pattern:

  1. Randomly assign a centroid to each of the k clusters (k =5).
  2. Calculate the distance of all observation to each of the k centroids
  3. Assign observations to the closest centroid
  4. Find the new location of the centroid by taking the mean of all the observations in each cluster
  5. Repeat steps 3-5 until the centroids do not change position

Note: You need a variable to decide when the K-means calculation is done – when
the amount the locations of the means changes between iterations is less than the variable. Set
the variable = 0.1.

Example of imput file (an rdd):

[(7869, 8696), (8676, -4746), (9484, 112526), (-1827, 5958), (987, 900087), (18127, 9383), (298, 272), (91716, 2827), (12625, 92827) ........]

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer