11_ANOVA_2023
pdf
School
McGill University *
*We aren’t endorsed by this school
Course
206
Subject
Biology
Date
Feb 20, 2024
Type
Pages
12
Uploaded by ConstableFlagBear38
BIOL206 - Lab 11: ANOVA and population dynamics
Fall 2023
Objectives
The learning objectives of this lab are to:
1.
Be introduced to population dynamics, an important element in ecology and conservation.
2. Compare the means of more than two groups using a statistical test called an ANOVA.
Population dynamics
Species extinctions are necessarily preceded by declines in abundance of plant and animal
populations. Thus, many ecologists are interested in tracking population trends (i.e., whether
abundance is increasing or decreasing), which can be done by repeatedly counting populations
over time. This is known as “time-series data”.
Population growth or decline can be described mathematically using the equation:
N
(
t
+ 1) =
N
(
t
)
∗
e
r
where
N(t)
is the population size at time
t
,
N(t+1)
is the population size at time
t + 1
, and
r
is the growth rate. The units of r are t
−
1
. Obviously, this is a simplified model of population
dynamics, as it does not include processes such as random population fluctuations and density
dependence. Nonetheless, our main interest is on the coefficient
r
, which will determine whether
the population is growing or declining over the long term.
Estimating population growth from time series data
We can rearrange the equation above to see how to estimate the growth rate,
r
, from time-series
data:
N
(
t
+ 1) =
N
(
t
)
∗
e
r
N
(
t
+ 1)
/N
(
t
) =
e
r
1
log
(
N
(
t
+ 1)
/N
(
t
)) =
r
log
(
N
(
t
+ 1))
−
log
(
N
(
t
)) =
r
Therefore, the difference in the log population size from one year to the next is an estimate of
the population growth rate. This value is known as the log-difference.
For example, suppose that the population size of a population has the following values over five
years:
Year
t
Population size
N(t)
Log of population size
log(N(t))
Log-difference
log(N(t+1))-log(N(t))
2000
39
3.664
2001
61
4.111
4.111-3.664 = 0.447
2002
48
3.871
3.871-4.111 = -0.24
2003
31
3.434
3.434-3.871 = -0.437
2004
32
3.466
3.466-3.434 = 0.032
Notice that:
•
If the population increases, the log-difference is positive.
•
If the population stays almost the same, the log-difference is close to 0.
•
If the population decreases, the log-difference is negative.
The mean log-difference over the five year time period was -0.05
year
−
1
. This value is an estimate
of the growth rate,
r
, of the population. In this lab and next week’s lab, we will try to predict
population growth rate.
The Living Planet Index
In this lab, we use data sampled from
The Living Planet Index
database. From the LPI website in
2022:
"The LPI tracks almost 21,000 populations of mammals, birds, fish, reptiles and
amphibians around the world.
[...]
The data is gathered from almost 4,000 sources, using increasingly sophisticated
technology such as audio devices to monitor insect sounds; drones and satellite
tagging to track populations on the move; and even block-chain technology to track
the impact of harvesting on wild populations."
2
Map of terrestrial & freshwater populations in the Living Planet Database.
Analysis of Variance
Today, we will use a statistical test called an ANOVA to try to predict population growth rate of
animal populations in the Living Planet Database.
So far, you have learned two types of hypothesis tests: t-tests and linear regressions. Today, you
will learn a third type of hypothesis test: analysis of variance (ANOVA). To review:
•
One-sample t-tests are used to determine whether a population mean is equal to a hypoth-
esized value.
•
Two-sample t-tests are used to determine whether two populations have the same mean.
•
Linear regressions are used to determine whether two continuous variables are related.
ANOVAs will add a new ability to your repertoire. Similar to a two-sample t-test but more flexible,
ANOVAs allow you to determine whether two
or
more populations have the same mean. The
statistical hypotheses associated with an ANOVA for k number of groups are:
•
Null hypothesis (
H
0
): The population mean is the same for all groups.
µ
1
=
µ
2
=
...
=
µ
k
•
Alternative hypothesis (
H
A
): The population mean varies between the groups.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
•
Note:
We do not generally specify the alternative hypothesis in mathematical terms for an
ANOVA with more than two groups. However, if we we want to write it out, we need to list
all possible group differences. For example, for an ANOVA with three groups:
H
A
:
µ
1
̸
=
µ
2
or
µ
1
̸
=
µ
3
or
µ
2
̸
=
µ
3
.
ANOVAs can test whether any number of groups have different population means. Therefore, it
can test whether two groups have different population means, just like two-sample t-tests. In
fact, two-sample t-tests and ANOVAs with two groups are mathematically related and can be
used interchangeably in many situations.
The F-statistic
ANOVAs use F as a test statistic. Although the calculation and distribution of F is different from t,
thehypothesistestingframeworkisthesame: Youcalculatetheteststatisticforyourobservations
and compare it to a critical value to know whether to reject the null hypothesis. The formula for
F is:
F
=
MSG
MSE
Where MSG is the mean square between groups, a measure of variation between the groups. MSE
is the mean square within groups, a measure of variation within groups.
Sources of variation in ANOVA (modi ed from
https://www.datanovia.com/en/lessons/anova-in-r/
)
Therefore, F is a ratio of variation between and within groups. The more variation there is between
groups (relative to within groups), the larger the F. If F is large enough (i.e., greater than the
critical value), we reject the null hypothesis that the groups have the same mean.
4
Procedure
This week you will be using ANOVAs to predict growth rates of populations in the Living Planet
Index database. We have already calculated the estimated growth rates for the populations
for you, by taking the mean log-difference of the time series data. We will refer to the mean
log-difference as “population growth rate”, although “estimated mean population growth rate”
would be a more accurate (but cumbersome) name. There is one population growth rate value
for each population.
Start by downloading the Living Planet data, “LPI.csv”, from MyCourses. Open an R Script, set
your working directory, and load the csv as a dataframe. Call it “LPI”. Each row is a different
population. The final column of the dataframe is Pop.growth. This is the mean log-difference of
each population. The other columns give other information about the populations, such as their
class and biome.
# Look at the LPI data
View
(LPI)
Scientific question
An interesting observation is that the average population growth of the populations is close to
zero. In fact, a one-sample t-test shows that the mean population growth rate is not significantly
different from zero.
# One-sample t-test of population growth
# H0: mu = 0
t.test
(LPI
$
Pop.growth,
mu=
0
,
alternative=
"two.sided"
)
This lack of change in average population size across all populations is the result of some popu-
lations increasing and offsetting declines in other populations. Looking at a histogram of the
population growth rates, we can see that the mean of the distribution is approximately zero, but
there is symmetrical variation around the mean.
# Install the ggplot2 package
install.packages
(
"ggplot2"
)
# Load ggplot2
library
(ggplot2)
# Histogram of population growth
ggplot
(
data =
LPI)
+
geom_histogram
(
mapping =
aes
(
x =
Pop.growth),
bins =
10
)
+
labs
(
title =
"Histogram of population growth rate"
,
x =
"Populuation growth rate (1/year)"
,
y =
"Frequency"
)
What explains this variation? In other words, what predicts whether a population declines, in-
creases, or stays the same over time? Populations do not grow or decline in isolation, but instead
are affected by myriad potential factors that may vary across space and time. A better under-
5
standing of what factors are associated with population decline could help inform conservation
priorities and strategies. Today, we will test whether taxonomic class and trophic level predict
population growth rate. We will start with the scientific question: Are some taxonomic classes of
vertebrates experiencing more population decline than other classes?
Biological hypothesis
We hypothesize that some classes of vertebrates are experiencing more population decline than
other classes because they are less able to adapt to rapid anthropogenic environmental change.
Exploratory data analysis
Just as with t-tests and linear regression, it’s vital to conduct an EDA before you begin a hypothesis
test. The EDA for an ANOVA is similar to the EDA for a two-sample t-test.
•
Calculate summary statistics of each group separately, as well as all the groups together.
•
Visualize the data: Make a histogram for each group and make a boxplot of all the distribu-
tions together. Note whether the distributions are approximately normal.
# Summary statistics of the full population growth distribution
mean
(LPI
$
Pop.growth)
median
(LPI
$
Pop.growth)
sd
(LPI
$
Pop.growth)
min
(LPI
$
Pop.growth)
max
(LPI
$
Pop.growth)
# Summary statistics for each group separately
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
mean)
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
median)
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
sd)
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
var)
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
min)
tapply
(
X =
LPI
$
Pop.growth,
INDEX =
LPI
$
Class,
FUN =
max)
# Histograms of each class
# Amphibians
ggplot
(
data =
LPI[LPI
$
Class
==
"Amphibia"
,])
+
geom_histogram
(
mapping =
aes
(
x =
Pop.growth),
bins =
10
)
+
labs
(
title =
"Histogram of population growth rate"
,
x =
"Population growth rate (1/year)"
,
y =
"Frequency"
)
# YOUR CODE HERE! (Make histograms for each class)
# Boxplot
ggplot
(LPI,
aes
(
x=
Class,
y=
Pop.growth))
+
geom_boxplot
()
+
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
labs
(
title =
"Population growth rate by animal class"
,
x =
"Taxonomic class"
,
y =
"Population growth rate (1/year)"
)
What are your impressions from this EDA? Do you think the ANOVA will find a significant difference
in mean population growth rate between the taxonomic classes?
Hypothesis test
State the biological hypothesis as statistical hypotheses:
•
Statistical null hypothesis (
H
0
): The population mean of population growth rate is the
same for all taxonomic classes.
µ
1
=
µ
2
=
µ
3
=
µ
4
•
Statistical alternative hypothesis (
H
A
): The population mean of population growth rate
varies between the taxonomic classes.
•
Note:
The wording for these hypotheses is a little awkward because they mention two
different populations. The population in “population growth rate” refers to ecological
populations (groups of animals of the same species in the same geographic area). The
population in “population mean” refers to the statistical populations, which in this case
are the populations of all ecological populations within each class.
Choose an appropriate statistical test that will allow you to reject
H
0
if it is false:
You want
to determine whether different groups (taxonomic classses) have different population means
(mean population growth rate). Therefore, an ANOVA is probably appropriate.
Before we can say for certain when an ANOVA is appropriate, we need to check the assumptions.
ANOVAs have the same assumptions as two-sample t-tests:
1.
The observations are independent of one another: For the purposes of this lab, we will
assume this assumption is met. In reality, there might be non-independence caused by
phylogeny, as we have seen in previous labs. There might also be non-independence due
to spatial structure: populations in the same geographic region might be more similar.
2.
The variable is approximately normally distributed within each sample: You saw from
your histograms in the EDA that the samples are approximately normally distributed. The
sample size of each group is relatively small, so we do not expect the distributions to follow
normal distributions closely.
3.
The samples have approximately equal variance (homoscedasticity): You saw from your
summary statistics in the EDA that the samples have approximately equal variance. Their
variance is roughly the same order of magnitude, and although mammal variance is some-
what higher than the other classes, it is not enough to cause problems.
Therefore, the data meets the assumptions and an ANOVA is appropriate.
Choose a significance level (
α
):
As usual,
α
= 0.05.
7
Determine the critical value that the test statistic must exceed to be significant:
Let’s
explore what the data should look like if the null hypothesis is true and the population means of
all classes are the same. What does a typical sample from this null population look like? What
range of F-values do the samples typically have? Like we did with the t-distribution, we can
simulate the null population and take samples to find out.
# Simulate a null population where all taxonomic classes have the same
# mean and standard deviation for population growth. Repeatedly sample
# from the population.
null_samp_F
= NULL
for
(i
in
1
:
10000
){
Pop.growth_null
=
rnorm
(
n=
135
,
mean=
mean
(LPI
$
Pop.growth),
sd=
sd
(LPI
$
Pop.growth))
Class_null
=
sample
(
x=
c
(
"Aves"
,
"Mammalia"
,
"Amphibia"
,
"Reptilia"
),
size=
135
,
replace=
TRUE
)
null_samp_F[i]
=
summary
(
aov
(Pop.growth_null
~
Class_null))[[
1
]][
1
,
4
]
}
# (This may take a few seconds to run)
# Make a histogram of the null sample Fs (sampling distribution)
ggplot
(
data =
data.frame
(null_samp_F))
+
geom_histogram
(
mapping =
aes
(
x =
null_samp_F),
bins =
50
)
+
labs
(
title =
"F of samples from the null population"
,
x =
"F"
,
y =
"Frequency"
)
Since we chose a significance level of 0.05, we want to reject the null hypothesis only if the F value
is unusual enough (when the null hypothesis is true) that it—or a more extreme value—occurs
only 5% of the time. We can use the simulated F-distribution to find the value that the sample Fs
will below 95% of the time when the null hypothesis is true:
# Put the numbers in increasing order
ordered_null_samp_F
=
null_samp_F[
order
(null_samp_F)]
# Select the 9,5000th value (of 10,000)
# Only 5% of values in the sample are larger than this value
ordered_null_samp_F[
9500
]
This value, which should be roughly 2.7, is the critical F. This means that F is equal or greater
than approximately this value only 5% of the time when the null hypothesis is true.
As with the t-distribution, it is educational to use a simulation to generate the F-distribution, but
using the theoretical F-distribution is easier and more accurate. In R, we can use the qf() function
with the following arguments:
•
p =
α
•
df1: ANOVAs have two degrees of freedoms. df1 is the degrees of freedom for the F numer-
ator (MSG). df1 = k-1, where k is the number of groups. df1 = 3.
•
df2: df2 is the degrees of freedom for the F denominator (MSE). df1 = n-k, where n is the
8
sample size of all the groups put together. df2 = 135-4 = 131.
•
lower.tail = FALSE
# Obtain the critical F
F_crit
=
qf
(
p =
0.05
,
df1 =
3
,
df2=
131
,
lower.tail=
F)
F_crit
# Add the critical F to the histogram of the F-distribution
Fdist
=
rf
(
n=
10000
,
df1 =
3
,
df2=
131
)
ggplot
(
data =
data.frame
(Fdist))
+
geom_histogram
(
mapping =
aes
(
x =
Fdist),
bins =
50
)
+
labs
(
title =
"F of samples from the null population"
,
x =
"F"
,
y =
"Frequency"
)
+
geom_vline
(
xintercept =
F_crit,
col=
"red"
)
From the theoretical F-distribution, the critical value is 2.674 (which should be pretty close to
the value from your simulated F-distribution). This is the cut-off value for the ANOVA. If the F we
calculate from the sample is greater than this critical value, we will reject the null hypothesis.
•
Note:
Notice that there is no one-sided test. ANOVAs are always non-directional.
Perform the statistical test:
To calculate F, we create an ANOVA table:
df
SS
MSS
F
Group
Residuals
We fill in each column from left to right, starting with df. This column is simply the degrees of
freedom we calculated earlier, with df1 in the Group row and df2 in the Residuals row.
# Degrees of freedom
df1
=
3
df2
=
131
df
SS
MSS
F
Group
3
Residuals
131
The group sums of squares, SSG, goes in the Group row of the SS column, and the sums of squares
of the residuals (SSE) goes in the Residuals row of the SS column.
SSG is the sum of squared differences between each observation’s group mean and the overall
mean. In other words, it is the sum of squared differences between the predicted values and the
mean.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# SSG:
# Predicted values of population growth
pred
=
predict
(
aov
(Pop.growth
~
Class,
data=
LPI),
newdata=
LPI)
# SSG
SSG
=
sum
((pred
-
mean
(LPI
$
Pop.growth))
ˆ
2
)
SSG
SSE is the sum of the squared residuals. For an ANOVA, the residuals are the differences between
the observations and their group means.
# SSE:
# Calculate the residuals by running the model
residuals
=
aov
(Pop.growth
~
Class,
data=
LPI)
$
residuals
# Sum of squared residuals
SSE
=
sum
(residuals
ˆ
2
)
SSE
df
SS
MSS
F
Group
3
0.0091
Residuals
131
1.0092
We calculate the mean squares (MS) by dividing the SS column by the df column. The mean
square between groups, MSG, goes in the Group row. This is a measure of how much the group
means vary. The mean square within groups, MSE, goes in the Residuals row. This is a measure
of how much the observations vary within the groups.
# MSG
MSG
=
SSG
/
df1
MSG
# MSE
MSE
=
SSE
/
df2
MSE
df
SS
MSS
F
Group
3
0.0091
0.003039
Residuals
131
1.0092
0.007704
Notice that MSG is small compared to MSE, which suggests the group means do not vary much,
relative to the variation within the groups. We formalize this comparison between MSG and MSE
by calculating F. Recall that F = MSG/MSE.
10
# F
F
=
MSG
/
MSE
F
df
SS
MSS
F
Group
3
0.0091
0.003039
0.394
Residuals
131
1.0092
0.007704
The calculated F is 0.394. From the previous step of the hypothesis testing framework, we found
that the critical value of F is 2.674.
# Add the calculated (blue) and critical (red) F
# to the histogram of the F-distribution
ggplot
(
data =
data.frame
(Fdist))
+
geom_histogram
(
mapping =
aes
(
x =
Fdist),
bins =
50
)
+
labs
(
title =
"F of samples from the null population"
,
x =
"F"
,
y =
"Frequency"
)
+
geom_vline
(
xintercept =
F_crit,
col=
"red"
)
+
geom_vline
(
xintercept =
0.394
,
col=
"blue"
)
The calculated F is smaller than the critical value, therefore we fail to reject the null hypothe-
sis that the population mean of population growth rate is the same for all taxonomic classes.
We conclude that the classes do not differ significantly in their mean population growth rate.
Our biological hypothesis that some taxonomic classes of vertebrates are experiencing more
population decline than other classes is not supported by this data.
We can verify our calculations using the aov() function in R:
# Run the ANOVA predicting population growth rate from class
anova1
=
aov
(Pop.growth
~
Class,
data=
LPI)
summary
(anova1)
You should see that the aov() function creates the same ANOVA table with the same values as we
calculated. The aov() ANOVA table has an additional column, Pr(>F). This is the p-value, which
provides another method of deciding whether to reject the null hypothesis. p = 0.757, which is
greater than 0.05, therefore we fail to reject the null hypothesis. When the null hypothesis is true,
a sample F of 0.394 or greater occurs 75.7% of the time (i.e., very often).
11
Assignment
Submit your completed assignment on MyCourses before your next lab session. Write your
assignment in Word, then upload it as a PDF.
None of your answers should be longer than four sentences.
Perform a statistical hypothesis test that address this scientific question: Does the trophic level
of a population affects its population growth rate?
1.
Below is the time-series data for an example population. Estimate the population growth
rate (mean log-difference) for the population. Show your calculations.
[0.5 pt]
Year
Population size
2019
102
2020
91
2021
84
2022
65
2.
Make a biological hypothesis. Support your hypothesis with a rationale and at least one
reference to a peer-reviewed study.
[1 pt]
3.
Perform exploratory data analysis. Present your results, including a table of summary statis-
tics, appropriate data visualization, and a short paragraph summarizing your impressions.
[1 pts]
4. Perform the statistical hypothesis test.
a. What are your null (
H
0
) and alternative (
H
A
) hypotheses? Provide both hypotheses in
words and provide the null hypothesis in mathematical format.
[0.25 pt]
b. Is an ANOVA appropriate to test these hypotheses? Justify your answer.
[0.25 pt]
c. What significance value did you choose? What are the degrees of freedom? What is the
critical F?
[0.25 pt]
d. Provide the completed ANOVA table. Show your calculations underneath the table. (You
can use R to calculate the values in the equations, such as the mean and residuals. Do not
use aov() except to double-check your answer.)
[0.5 pts]
e. Do you reject the null hypothesis? Why or why not? What do you infer from the results of
this test?
[0.25 pt]
Total points = 4 pts
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Biology: The Dynamic Science (MindTap Course List)
Biology
ISBN:9781305389892
Author:Peter J. Russell, Paul E. Hertz, Beverly McMillan
Publisher:Cengage Learning

Biology: The Unity and Diversity of Life (MindTap...
Biology
ISBN:9781337408332
Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology: The Unity and Diversity of Life (MindTap...
Biology
ISBN:9781305073951
Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology Today and Tomorrow without Physiology (Mi...
Biology
ISBN:9781305117396
Author:Cecie Starr, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology (MindTap Course List)
Biology
ISBN:9781337392938
Author:Eldra Solomon, Charles Martin, Diana W. Martin, Linda R. Berg
Publisher:Cengage Learning
Recommended textbooks for you
- Biology: The Dynamic Science (MindTap Course List)BiologyISBN:9781305389892Author:Peter J. Russell, Paul E. Hertz, Beverly McMillanPublisher:Cengage LearningBiology: The Unity and Diversity of Life (MindTap...BiologyISBN:9781337408332Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa StarrPublisher:Cengage LearningBiology: The Unity and Diversity of Life (MindTap...BiologyISBN:9781305073951Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa StarrPublisher:Cengage Learning
- Biology Today and Tomorrow without Physiology (Mi...BiologyISBN:9781305117396Author:Cecie Starr, Christine Evers, Lisa StarrPublisher:Cengage LearningBiology (MindTap Course List)BiologyISBN:9781337392938Author:Eldra Solomon, Charles Martin, Diana W. Martin, Linda R. BergPublisher:Cengage Learning

Biology: The Dynamic Science (MindTap Course List)
Biology
ISBN:9781305389892
Author:Peter J. Russell, Paul E. Hertz, Beverly McMillan
Publisher:Cengage Learning

Biology: The Unity and Diversity of Life (MindTap...
Biology
ISBN:9781337408332
Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology: The Unity and Diversity of Life (MindTap...
Biology
ISBN:9781305073951
Author:Cecie Starr, Ralph Taggart, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology Today and Tomorrow without Physiology (Mi...
Biology
ISBN:9781305117396
Author:Cecie Starr, Christine Evers, Lisa Starr
Publisher:Cengage Learning

Biology (MindTap Course List)
Biology
ISBN:9781337392938
Author:Eldra Solomon, Charles Martin, Diana W. Martin, Linda R. Berg
Publisher:Cengage Learning