Python Language: Number Grouping Program Example: separate (([10, 12, 45, 47, 91, 98, 99]), 3) It should return [[10, 12], [45, 47], [91, 98, 99]].

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Python Language:

Number Grouping Program

Example: separate (([10, 12, 45, 47, 91, 98, 99]), 3)

It should return [[10, 12], [45, 47], [91, 98, 99]].

How can we automatically split seemingly-random data points into meaningful groups? For example,
consider the heights of 20 people (in cm) presented in the ascending order:
71.4, 72.73, 74.36, 75. 38, 76.15, 76.96, 79.51, 86.82, 87.81, 87.87, 146.38, 150.89,
151.16, 152.18, 152.36, 153.27, 155.7, 160.99, 161.36, 164.55
Say we happen to know that there are two age groups in this population-kids and adults. How can we
tell them apart? It may not be clear at first glance, but plotting this reveals an interesting trend:
With the plot, it is visually trivial to split this dataset into two groups: there is a big gap in the mid-
dle, separating the heights into two logical sides. The more involved question-and the basis for this
problem-is, how can we discover this
gap automatically?
Answering this and similar questions has been a topic of extensive research, fueled by real-world prob-
lems ranging from deciding letter grades to giving suggestions on Facebook (e.g., who are your close
friends?).
Transcribed Image Text:How can we automatically split seemingly-random data points into meaningful groups? For example, consider the heights of 20 people (in cm) presented in the ascending order: 71.4, 72.73, 74.36, 75. 38, 76.15, 76.96, 79.51, 86.82, 87.81, 87.87, 146.38, 150.89, 151.16, 152.18, 152.36, 153.27, 155.7, 160.99, 161.36, 164.55 Say we happen to know that there are two age groups in this population-kids and adults. How can we tell them apart? It may not be clear at first glance, but plotting this reveals an interesting trend: With the plot, it is visually trivial to split this dataset into two groups: there is a big gap in the mid- dle, separating the heights into two logical sides. The more involved question-and the basis for this problem-is, how can we discover this gap automatically? Answering this and similar questions has been a topic of extensive research, fueled by real-world prob- lems ranging from deciding letter grades to giving suggestions on Facebook (e.g., who are your close friends?).
What we did in our brain was essentially locating the widest gap visually. This makes intuitive and
mathematical sense because the widest gap is where our data points are the most dissimilar.
Generalizing this idea gives us a simple heuristic that often works okay on real data. Say we want to
identify k groups. We can proceed as follows:
Step 1: Find the largest gap in the data.
Step 2: Split the data at that point.
This gives us 2 groups. To further split the data, we find the largest gap that hasn't been split and split
it there. This gives us one more group. We can repeat this process until we get k groups.
Let us illustrate this idea with a simple example. Suppose k = 3 and our input is the following numbers:
10, 12,45, 47,91,98,99.
They look as follows in plot:
As before, we assume the input is sorted from small to large. And we wish to split it into k = 3
To begin, we'll treat the input as one big group:
groups.
[10,12,45,47,91,98,99]
Then, we look for the largest gap between two consecutive elements. In this example, the largest gap is
in between 47 and 91 (you can check that). That is where we want to split. After that, the list becomes:
[10,12,45, 47] [91,98,99]
Next, we find the largest gap between two successive elements in all groups. This time, the largest gap
is in the first group (to know this, we'll have to check the other group, to0). The overall largest gap is
between 12 and 45. Hence, we further split it there, leading to the following groups (illustration on
right):
[10,12] [45,47) [91,98,99]
Group 1
Group 2
Group 3
We now have 3 groups, so we can stop and output them.
REMARKS: Mathematically, this process partitions a list into k sublists so that the total gap-the sum
of all gaps-between sublists is maximized. It makes sense intuitively and aligns with out intuition that
different groups should be dissimilar, so the process attempts to maximize how different they are.
YOUR TASK:
You will write a function
separate(data: list[int], k: int) -> list[list[int]]
that partitions data into k sublists that maximize dissimilarity using the process described above.
Importantly: for this task, your implementation must find the largest gap and split the dataset at the
gap, repeating this process until you have k groups. If there are ties, pick the left-most gap. To keep
things simple, we will assume that the input list is always ordered from small to large.
Hence, for example,
separate([10, 12, 45, 47, 91, 98, 99], 3)
should return
[10, 12], [45, 47], [91, 98, 99]].
Transcribed Image Text:What we did in our brain was essentially locating the widest gap visually. This makes intuitive and mathematical sense because the widest gap is where our data points are the most dissimilar. Generalizing this idea gives us a simple heuristic that often works okay on real data. Say we want to identify k groups. We can proceed as follows: Step 1: Find the largest gap in the data. Step 2: Split the data at that point. This gives us 2 groups. To further split the data, we find the largest gap that hasn't been split and split it there. This gives us one more group. We can repeat this process until we get k groups. Let us illustrate this idea with a simple example. Suppose k = 3 and our input is the following numbers: 10, 12,45, 47,91,98,99. They look as follows in plot: As before, we assume the input is sorted from small to large. And we wish to split it into k = 3 To begin, we'll treat the input as one big group: groups. [10,12,45,47,91,98,99] Then, we look for the largest gap between two consecutive elements. In this example, the largest gap is in between 47 and 91 (you can check that). That is where we want to split. After that, the list becomes: [10,12,45, 47] [91,98,99] Next, we find the largest gap between two successive elements in all groups. This time, the largest gap is in the first group (to know this, we'll have to check the other group, to0). The overall largest gap is between 12 and 45. Hence, we further split it there, leading to the following groups (illustration on right): [10,12] [45,47) [91,98,99] Group 1 Group 2 Group 3 We now have 3 groups, so we can stop and output them. REMARKS: Mathematically, this process partitions a list into k sublists so that the total gap-the sum of all gaps-between sublists is maximized. It makes sense intuitively and aligns with out intuition that different groups should be dissimilar, so the process attempts to maximize how different they are. YOUR TASK: You will write a function separate(data: list[int], k: int) -> list[list[int]] that partitions data into k sublists that maximize dissimilarity using the process described above. Importantly: for this task, your implementation must find the largest gap and split the dataset at the gap, repeating this process until you have k groups. If there are ties, pick the left-most gap. To keep things simple, we will assume that the input list is always ordered from small to large. Hence, for example, separate([10, 12, 45, 47, 91, 98, 99], 3) should return [10, 12], [45, 47], [91, 98, 99]].
Expert Solution
steps

Step by step

Solved in 3 steps with 2 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY