2 2 Table 1FrequencyCity(Millions of people)Relative FrequencyLos Angeles3.830.2048Chicago2.840.1519 ▼Houston2.21 v0.1182Phoenix1.550.0829New York City8.270.4422Total18.70100The U.S. population is about 300 million. The frequency of Los Angeles residents in the U.s. populationabout 3.83 millionpeople. The relativefrequency of Los Angeles residents in the U.S. population is aboutIn 1935, Harvard linguist George Zipf pointed out that the frequency of the kth most frequent word in a language is roughly proportional to 1/k. Thisimplies that the second most frequent word in a language has a frequency one-half that of the most frequent word, the third most frequent word hasa frequency one-third that of the most frequent word, and so on. A distribution that follows this rule is said to obey Zipf's Law.Zipf's Law has been observed not only in word distributions, but in other phenomena as well, such as the populations of cities.The frequency of the second most frequent word in the Brown Corpus isthat of the most frequent word. The population of the second largestcity in the United States isv that of the largest city.The frequency of the fourth most frequent word in the Brown Corpus isv that of the most frequent word. The population of the fourthlargest city in the United States isv that of the largest city. A corpus is a technical term for a collection of texts used to analyze a language and verify its linguistic properties. The first modern, computer-readable corpus was the Brown Corpus of Standard American English, compiled by Henry Kucera and W. Nelson Francis of Brown University. TheBrown Corpus draws from American English texts printed in 1961 and was for many years a widely cited resource in computational linguistics.The five most frequently occurring words in the Brown Corpus are the, of, and, to, and a. Consider a data set consisting of all occurrences of thesewords in the Corpus. The values of the variable named Word are and, to, of, the, and a, so Word is a nominal variable with five categories.Frequency and relative frequency distributions are constructed to summarize the data. They are shown in the table that follows, but the table isincomplete. Use the dropdown menus to complete the table.Table 1WordFrequencyRelative Frequency(Thousands of occurrences)and28.90.1566to26.10.1415 vof36.4 v0.1973the70.00.3794a23.10.1252Total184.51.0000 vThe Brown Corpus contains about 1 million words. The frequency of the word and in the entire corpus is about 28,90 0 v occurrences. The relativefrequency of the word and in the entire corpus is about 0.0289 ▼A census is an enumeration of a population. The U.S. Census Bureau conducts a census every 10 years, but in addition, the Population EstimatesProgram of the bureau publishes population estimates for incorporated places every year. According to 2007 estimates, the five largest U.S. cities (bypopulation) are New York City, Los Angeles, Chicago, Houston, and Phoenix.Consider a data set consisting of all the residents of these five cities. The values of the variable named City are Los Angeles, Chicago, Houston,Phoenix, and New York City, so City is a nominal variable with five categories. Frequency and relative frequency distributions are provided in the tablebelow, but the table is incomplete. Use the dropdown menus to complete the table.

Question

Accepted Answer

Zipf's law
The frequency of the kth most frequent word  ∝ 1/k