```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
eval = FALSE,
cache = FALSE,
warning = FALSE,
message = FALSE
)
options(width = 80, digits = 3)
```
# `r emo::ji("target")` Objectives
In this tutorial, you will learn
- load in and manipulate COVID-19 data
- write a model to estimate the growth rate of PCR-detected cases
- compare growth rates in LGAs with vaccination coverage
# `r emo::ji("wrench")` Preparation
Data links:
- [Case data](https://gist.githubusercontent.com/MikeLydeamore/e911c95a792c1aeb72368a044de5ed27/raw/c8c9fe8842b25353d0ae60d447d8cb0a157604af/tutorial_10_vic_cases_by_lga.csv)
- [Vaccination data](https://gist.githubusercontent.com/MikeLydeamore/d445691db1922b355569e45824d46f42/raw/3bc674501c37f412fa2f32711d60f658b01638ca/tutorial_10_vic_vaccinations_by_lga.csv)
To load in the data, we can read it directly from the URLs above.
```{r, message=F}
case_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/e911c95a792c1aeb72368a044de5ed27/raw/c8c9fe8842b25353d0ae60d447d8cb0a157604af/tutorial_10_vic_cases_by_lga.csv")
vaccination_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/d445691db1922b355569e45824d46f42/raw/3bc674501c37f412fa2f32711d60f658b01638ca/tutorial_10_vic_vaccinations_by_lga.csv")
```
We'll also need the `tidyverse` and `lubridate`:
```{r load-packages}
library(tidyverse)
library(lubridate)
```
# `r emo::ji("gear")` Tutorial
## 1. Manipulating the data
a. Convert the long `case_data` data frame into a table with three columns:
i) `date`: The date of diagnosis
ii) `local_government_area`: The LGA of interest
iii) `num_cases`: The number of PCR positive cases on a given day.
```{r aggregate-cases, message=F}
case_data_aggregated <- case_data %>%
group_by(diagnosis_date, Localgovernmentarea) %>%
summarise(num_cases = n()) %>%
rename(
date = diagnosis_date,
local_government_area = Localgovernmentarea
)
head(case_data_aggregated)
```
b. Draw a plot with the _total_ number of cases on the y axis, and the LGAs (ordered from least to most cases) on the x axis.
You can add `theme(axis.text.x = element_text(angle = 90, hjust=1, vjust=0.5))` to your `ggplot` to rotate and align the text.
c. Convert the dose numbers for vaccination to numeric variables instead of characters. For LGAs with >95% coverage, set them to 0.95.
(hint: `substr(x, 1, nchar(x)-1)` will remove the last character of a string)
```{r convert-vax-to-numeric, message=F}
vaccination_data_converted <- vaccination_data %>%
mutate(
FIRST = substr(FIRST, 1, nchar(FIRST) - 1),
SECOND = substr(SECOND, 1, nchar(SECOND) - 1),
THIRD = substr(THIRD, 1, nchar(THIRD) - 1),
) %>%
mutate(
FIRST = case_when(
FIRST == ">95" ~ 0.95,
TRUE ~ as.numeric(FIRST) / 100
),
SECOND = case_when(
SECOND == ">95" ~ 0.95,
TRUE ~ as.numeric(SECOND) / 100
),
THIRD = case_when(
THIRD == ">95" ~ 0.95,
TRUE ~ as.numeric(THIRD) / 100
)
) %>%
rename(
local_government_area = LGA
)
head(vaccination_data_converted)
```
d. Draw a plot with the third dose coverage on the y axis, and LGAs (ordered from lowest to highest coverage) on the x axis.
```{r}
vaccination_data_converted %>%
ggplot(aes(x = forcats::fct_reorder(local_government_area, THIRD), y = THIRD)) +
geom_col()
```
You could also use a dotplot here.
## 2. Calculating the growth rate
The growth rate over a time period, $t_1$, is defined as
$$C(t_1) = C(t_0) \exp(rt),$$ where $r$ is the rate of growth.
Rearranging this expression gives,
$$r = \frac{\log\left(\frac{C(t_1)}{C(t_0)}\right)}{t}.$$
Below is a function called `calculate_growth_rate` that takes three inputs (`Ct0`, `Ct1`, and `time`) and returns the growth rate.
Remember, it's always a good idea to test your code and make sure it returns the numbers you expect.
```{r growth-rate-function}
calculate_growth_rate <- function(Ct0, Ct1, time) {
log(Ct1 / Ct0) / time
}
```
Calculate the 7-day growth rate for each Victorian LGA since 1 January 2022. For this tutorial, we will assume detection date (which is what is present in the data) is the infection date to calculate the growth rate on.
```{r calculate-growth-rate, message=F}
growth_rate <- case_data_aggregated %>%
filter(year(date) >= 2022) %>%
group_by(local_government_area) %>%
mutate(
growth_rate = slider::slide_dbl(
.x = num_cases,
.f = function(v) {
calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
},
.before = 6,
.after = 0
)
)
head(growth_rate)
```
## 3. Correlating growth rates with vaccination coverage
You've been asked to explore whether vaccination coverage and growth rate are correlated. You are on a tight deadline and so only have time to produce one plot.
Thankfully, your colleague has given you some code to join together the vaccination coverage and growth rate data:
```{r remove-suffixes}
codes_to_remove <- c(
" \\(C\\)",
" \\(S\\)",
" \\(RC\\)",
" \\(B\\)"
) %>%
paste0(collapse = "|")
summarised_growths <- summarised_growths %>%
mutate(local_government_area = gsub(
pattern = codes_to_remove,
replacement = "",
x = local_government_area
))
```
```{r third-dose-join}
terminal_coverage <- vaccination_data_converted %>%
left_join(summarised_growths, on = "local_government_area")
```
Create a plot that shows evidence _for_ or _against_ the hypothesis that vaccination coverage and growth rate are correlated. Write a maximum four sentence executive summary of your findings, and compare with a partner in the class.
There are lots of ways to do this. The below is an advanced solution, which calculates the growth rate every day and then looks at the average.
```{r, message=F}
growth_rate <- case_data_aggregated %>%
filter(year(date) >= 2022) %>%
group_by(local_government_area) %>%
mutate(
growth_rate = slider::slide_dbl(
.x = num_cases,
.f = function(v) {
calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
},
.before = 6,
.after = 0
)
)
codes_to_remove <- c(
" \\(C\\)",
" \\(S\\)",
" \\(RC\\)",
" \\(B\\)"
) %>%
paste0(collapse = "|")
growth_rate %>%
summarise(mean_growth_rate = mean(growth_rate, na.rm = TRUE)) %>%
mutate(local_government_area = gsub(
pattern = codes_to_remove,
replacement = "",
x = local_government_area
)) %>%
left_join(vaccination_data_converted, by = "local_government_area") %>%
ggplot(aes(x = mean_growth_rate, y = THIRD)) +
geom_point() +
labs(x = "Mean annual growth rate", y = "Third dose coverage") +
theme_bw()
```
From this plot, there is clearly one outlier on the left. Otherwise, there is **no** clear trend between third dose coverage and growth rate. Perhaps this is our methodology, perhaps it is a real observation, more analysis would be required to tell.
An example executive summary could be:
Correlation between third dose coverage in a local government area and mean annual growth rate have been compared. There is a single LGA with an unusually low growth rate and third dose coverage. Otherwise, there is no clear association between these two quantities. This suggests that increasing third dose coverage may not further reduce growth rate.
#### © Copyright 2022 Monash University