In this tutorial, you will learn
Data links:
To load in the data, we can read it directly from the URLs above.
case_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/e911c95a792c1aeb72368a044de5ed27/raw/c8c9fe8842b25353d0ae60d447d8cb0a157604af/tutorial_10_vic_cases_by_lga.csv")
vaccination_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/d445691db1922b355569e45824d46f42/raw/3bc674501c37f412fa2f32711d60f658b01638ca/tutorial_10_vic_vaccinations_by_lga.csv")
We’ll also need the tidyverse
and
lubridate
:
library(tidyverse)
library(lubridate)
Convert the long case_data
data frame into a table
with three columns:
date
: The date of diagnosislocal_government_area
: The LGA of interestnum_cases
: The number of PCR positive cases on a given
day.case_data_aggregated <- case_data %>%
group_by(diagnosis_date, Localgovernmentarea) %>%
summarise(num_cases = n()) %>%
rename(
date = diagnosis_date,
local_government_area = Localgovernmentarea
)
head(case_data_aggregated)
Draw a plot with the total number of cases on the y
axis, and the LGAs (ordered from least to most cases) on the x
axis.
You can add
theme(axis.text.x = element_text(angle = 90, hjust=1, vjust=0.5))
to your ggplot
to rotate and align the text.
Convert the dose numbers for vaccination to numeric variables
instead of characters. For LGAs with >95% coverage, set them to
0.95.
(hint: substr(x, 1, nchar(x)-1)
will remove the last
character of a string)
vaccination_data_converted <- vaccination_data %>%
mutate(
FIRST = substr(FIRST, 1, nchar(FIRST) - 1),
SECOND = substr(SECOND, 1, nchar(SECOND) - 1),
THIRD = substr(THIRD, 1, nchar(THIRD) - 1),
) %>%
mutate(
FIRST = case_when(
FIRST == ">95" ~ 0.95,
TRUE ~ as.numeric(FIRST) / 100
),
SECOND = case_when(
SECOND == ">95" ~ 0.95,
TRUE ~ as.numeric(SECOND) / 100
),
THIRD = case_when(
THIRD == ">95" ~ 0.95,
TRUE ~ as.numeric(THIRD) / 100
)
) %>%
rename(
local_government_area = LGA
)
head(vaccination_data_converted)
vaccination_data_converted %>%
ggplot(aes(x = forcats::fct_reorder(local_government_area, THIRD), y = THIRD)) +
geom_col()
You could also use a dotplot here.
The growth rate over a time period, \(t_1\), is defined as
\[C(t_1) = C(t_0) \exp(rt),\] where \(r\) is the rate of growth.
Rearranging this expression gives, \[r = \frac{\log\left(\frac{C(t_1)}{C(t_0)}\right)}{t}.\]
Below is a function called calculate_growth_rate
that
takes three inputs (Ct0
, Ct1
, and
time
) and returns the growth rate.
Remember, it’s always a good idea to test your code and make sure it
returns the numbers you expect.
calculate_growth_rate <- function(Ct0, Ct1, time) {
log(Ct1 / Ct0) / time
}
Calculate the 7-day growth rate for each Victorian LGA since 1 January 2022. For this tutorial, we will assume detection date (which is what is present in the data) is the infection date to calculate the growth rate on.
growth_rate <- case_data_aggregated %>%
filter(year(date) >= 2022) %>%
group_by(local_government_area) %>%
mutate(
growth_rate = slider::slide_dbl(
.x = num_cases,
.f = function(v) {
calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
},
.before = 6,
.after = 0
)
)
head(growth_rate)
You’ve been asked to explore whether vaccination coverage and growth rate are correlated. You are on a tight deadline and so only have time to produce one plot.
Thankfully, your colleague has given you some code to join together the vaccination coverage and growth rate data:
codes_to_remove <- c(
" \\(C\\)",
" \\(S\\)",
" \\(RC\\)",
" \\(B\\)"
) %>%
paste0(collapse = "|")
summarised_growths <- summarised_growths %>%
mutate(local_government_area = gsub(
pattern = codes_to_remove,
replacement = "",
x = local_government_area
))
terminal_coverage <- vaccination_data_converted %>%
left_join(summarised_growths, on = "local_government_area")
Create a plot that shows evidence for or against the hypothesis that vaccination coverage and growth rate are correlated. Write a maximum four sentence executive summary of your findings, and compare with a partner in the class.
There are lots of ways to do this. The below is an advanced solution, which calculates the growth rate every day and then looks at the average.
growth_rate <- case_data_aggregated %>%
filter(year(date) >= 2022) %>%
group_by(local_government_area) %>%
mutate(
growth_rate = slider::slide_dbl(
.x = num_cases,
.f = function(v) {
calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
},
.before = 6,
.after = 0
)
)
codes_to_remove <- c(
" \\(C\\)",
" \\(S\\)",
" \\(RC\\)",
" \\(B\\)"
) %>%
paste0(collapse = "|")
growth_rate %>%
summarise(mean_growth_rate = mean(growth_rate, na.rm = TRUE)) %>%
mutate(local_government_area = gsub(
pattern = codes_to_remove,
replacement = "",
x = local_government_area
)) %>%
left_join(vaccination_data_converted, by = "local_government_area") %>%
ggplot(aes(x = mean_growth_rate, y = THIRD)) +
geom_point() +
labs(x = "Mean annual growth rate", y = "Third dose coverage") +
theme_bw()
From this plot, there is clearly one outlier on the left. Otherwise, there is no clear trend between third dose coverage and growth rate. Perhaps this is our methodology, perhaps it is a real observation, more analysis would be required to tell.
An example executive summary could be:
Correlation between third dose coverage in a local government area and mean annual growth rate have been compared. There is a single LGA with an unusually low growth rate and third dose coverage. Otherwise, there is no clear association between these two quantities. This suggests that increasing third dose coverage may not further reduce growth rate.