## Upcoming Assignments/Quizzes

Assignments | Open Time | Due Time |
---|---|---|

ANOVA Article Analysis Activity | October 22nd (1:00 am EST) | October 28th (11:55 pm EST) |

Module 8 Data Quiz | October 26th (1:00 am EST) | October 28th (11:55 pm EST) |

Module 9 Conceptual Quiz | October 26th (1:00 am EST) | October 28th (11:55 pm EST) |

## Notes from Discussion Board/Office Hours

### Relationship between the \(F\)-statistic, p-value, and null hypothesis

In sub-module 9.3, Dr. Baiser covers how to test hypotheses using ANOVA. To do this, we calculate our observed \(F\)-statistic using the mean square among groups and mean square within group from our observed data, and compare that to the distribution of possible \(F\)-statistics (i.e. the \(F\) distribution) based on the degrees of freedom (df) in the numeration and denominator of our \(F\)-statistic to determine how significance of our observed value.

Let’s make some plots to visualize this comparison step-by-step. I’ll use the same example from the sub-module 9.3 lecture. Let’s start from when we calculate our observed \(F\)-statistics (pg. 15 from 9.3 notes), which I’ll call `f_obs`

. Based on our calculations of the mean squares we determined that \(F_{obs} = 5.11\).

Now let’s draw our \(F\)-distribution. Recall that this is determined by the dfs in the numerator (\(df_{num}\)) and the denominator (\(df_{den}\)) of our \(F\)-statistic. If we have \(a\) number of treatments and \(n\) number of replicates, than \(df_{num} = a - 1\) and \(df_{den} = n(a-1)\). In our example, \(a=3\) and \(n=4\) (pg. 8), therefore \(df_{num} = 2\) and \(df_{num} = 9\). With this information we can draw our \(F\)-distribution by creating a vector of possible values of \(F\) and passing those into the `df()`

function in .

```
library(tidyverse)
library(ggpubr)
# Possible values of F-stat:
x = seq(from = 0, to = 10, by = 0.01)
# Probability of possible values of F-stat
y = df(x = x, df1 = 2, df2 = 9)
ggplot() +
geom_line(aes(x, y)) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
```

This curve shows the possible values for the \(F\)-statistic (shown on the x-axis) and the probability of observing those values (y-axis) *if the null hypothesis were true* (based on the dfs we specified). We can use this to determine if we should reject or fail to reject the null hypothesis by comparing `f_obs`

to a theoretical \(F\)-statistic based on a critical value \(\alpha\), which you’ll recall is often set to \(\alpha = 0.05\). This \(F\)-statistic, which we will call `f_crit`

, will correspond to having a p-value of exactly 0.05.

It is important to note that we working with a density function, which means that we are interested in the **area under the curve**. We *can not* simply draw a line with a y-intercept of 0.05 to find `f_crit`

. Instead we need to find the “quantile” of our area of interest (5% or 0.05). Luckily the `qf()`

can calculate quantile for the \(F\)-distribution:

`f_crit <- qf(p = 0.05, df1 = 2, df2 = 9, lower.tail = F) `

Which determines that `f_crit`

is equal to 4.26. Note that we set `lower.tail = F`

because were are using a one-way test on the high end. Now we can draw the area under the curve that represents the “rejection region”:

```
ggplot(data.frame(x,y)) +
geom_line(aes(x, y)) +
stat_function(fun = df,
args = list(df1 = 2, df2 = 9),
xlim = c(f_crit, 10),
geom = "area",
fill = "red",
alpha = 0.6) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
```

Finally, let’s add `f_obs`

to our plot:

```
f_obs = 5.11
ggplot(data.frame(x,y)) +
geom_line(aes(x, y)) +
stat_function(fun = df,
args = list(df1 = 2, df2 = 9),
xlim = c(f_crit, 10),
geom = "area",
fill = "red",
alpha = 0.6) +
geom_vline(aes(xintercept = f_obs), color = "darkblue", linetype = 2) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
```

As you can see, `f_obs`

falls in the rejection region, and therefore we will reject the null hypothesis that there is no difference between our treatments. As a final note, we can also calculate the p-value associated with `f_obs`

using the `pf()`

function:

```
p_value <- pf(f_obs, df1 = 2, df2 = 9, lower.tail = F)
round(p_value, 3)
```

`## [1] 0.033`