preview

Questions On Analyzing Quantitative Data

Decent Essays
Assignment Three: Analyzing Quantitative Data

List of Figures
Figure 1 Frequency per rank for the top 100 words 3
Figure 2 A histogram showing the distribution of token size across the "Doctored Dataset" Corpus 7
Figure 3 A boxplot showing the distribution of token size across the "Doctored Dataset" Corpus 7
Figure 4 A Q-Q plot of the distribution of token size across the "Doctored Dataset" Corpus 8
Figure 5 A boxplot showing the median length of research articles in each disciplinary domain 8
Figure 6 A boxplot showing the descriptive statistics in five academic domains 11
Figure 7 A scatter plot showing the correlation between the numbers of tokens and types across the five domains 12

List of Tables
Table 1 Descriptive
…show more content…
This simply means that the majority of words occur only a few times. Another evidence of this conclusion is the median frequency which is 2.00.

Question 3 Figure 1 Frequency per rank for the top 100 words
Figure 1 is a line graph that shows the correlation between the frequency (y) and the rank (x) of the top 100 most frequent words in the pre-mentioned academic corpus. As in Zipf’s Law, the frequency of a word in a corpus of natural language is inversely proportional to its rank in the frequency table. That is, the most frequent word occurs approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

Question 5
The table below shows a Chi-square test of the use of the word ‘lovely’ in both COCA: spoken and BYU-BNC: spoken corpora Lovely All other words Total words
COCA: spoken 1836 90063928 90 065 764
BYU-BNC: spoken 2397 9961266 9 963 663
Total 4233 100025194 100029427
Table 2 The use of
Get Access