🎯 Objectives

In this tutorial, you will learn

🔧 Preparation

Data links:

To load in the data, we can read it directly from the URLs above.

case_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/e911c95a792c1aeb72368a044de5ed27/raw/c8c9fe8842b25353d0ae60d447d8cb0a157604af/tutorial_10_vic_cases_by_lga.csv")

vaccination_data <- readr::read_csv("https://gist.githubusercontent.com/MikeLydeamore/d445691db1922b355569e45824d46f42/raw/3bc674501c37f412fa2f32711d60f658b01638ca/tutorial_10_vic_vaccinations_by_lga.csv")

We’ll also need the tidyverse and lubridate:

library(tidyverse)
library(lubridate)

⚙️ Tutorial

1. Manipulating the data

  1. Convert the long case_data data frame into a table with three columns:

    1. date: The date of diagnosis
    2. local_government_area: The LGA of interest
    3. num_cases: The number of PCR positive cases on a given day.
case_data_aggregated <- case_data %>%
  group_by(diagnosis_date, Localgovernmentarea) %>%
  summarise(num_cases = n()) %>%
  rename(
    date = diagnosis_date,
    local_government_area = Localgovernmentarea
  )

head(case_data_aggregated)
  1. Draw a plot with the total number of cases on the y axis, and the LGAs (ordered from least to most cases) on the x axis.
    You can add theme(axis.text.x = element_text(angle = 90, hjust=1, vjust=0.5)) to your ggplot to rotate and align the text.

  2. Convert the dose numbers for vaccination to numeric variables instead of characters. For LGAs with >95% coverage, set them to 0.95.
    (hint: substr(x, 1, nchar(x)-1) will remove the last character of a string)

vaccination_data_converted <- vaccination_data %>%
  mutate(
    FIRST = substr(FIRST, 1, nchar(FIRST) - 1),
    SECOND = substr(SECOND, 1, nchar(SECOND) - 1),
    THIRD = substr(THIRD, 1, nchar(THIRD) - 1),
  ) %>%
  mutate(
    FIRST = case_when(
      FIRST == ">95" ~ 0.95,
      TRUE ~ as.numeric(FIRST) / 100
    ),
    SECOND = case_when(
      SECOND == ">95" ~ 0.95,
      TRUE ~ as.numeric(SECOND) / 100
    ),
    THIRD = case_when(
      THIRD == ">95" ~ 0.95,
      TRUE ~ as.numeric(THIRD) / 100
    )
  ) %>%
  rename(
    local_government_area = LGA
  )

head(vaccination_data_converted)
  1. Draw a plot with the third dose coverage on the y axis, and LGAs (ordered from lowest to highest coverage) on the x axis.
vaccination_data_converted %>%
  ggplot(aes(x = forcats::fct_reorder(local_government_area, THIRD), y = THIRD)) +
  geom_col()

You could also use a dotplot here.

2. Calculating the growth rate

The growth rate over a time period, \(t_1\), is defined as

\[C(t_1) = C(t_0) \exp(rt),\] where \(r\) is the rate of growth.

Rearranging this expression gives, \[r = \frac{\log\left(\frac{C(t_1)}{C(t_0)}\right)}{t}.\]

Below is a function called calculate_growth_rate that takes three inputs (Ct0, Ct1, and time) and returns the growth rate.
Remember, it’s always a good idea to test your code and make sure it returns the numbers you expect.

calculate_growth_rate <- function(Ct0, Ct1, time) {
  log(Ct1 / Ct0) / time
}

Calculate the 7-day growth rate for each Victorian LGA since 1 January 2022. For this tutorial, we will assume detection date (which is what is present in the data) is the infection date to calculate the growth rate on.

growth_rate <- case_data_aggregated %>%
  filter(year(date) >= 2022) %>%
  group_by(local_government_area) %>%
  mutate(
    growth_rate = slider::slide_dbl(
      .x = num_cases,
      .f = function(v) {
        calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
      },
      .before = 6,
      .after = 0
    )
  )

head(growth_rate)

3. Correlating growth rates with vaccination coverage

You’ve been asked to explore whether vaccination coverage and growth rate are correlated. You are on a tight deadline and so only have time to produce one plot.

Thankfully, your colleague has given you some code to join together the vaccination coverage and growth rate data:

codes_to_remove <- c(
  " \\(C\\)",
  " \\(S\\)",
  " \\(RC\\)",
  " \\(B\\)"
) %>%
  paste0(collapse = "|")

summarised_growths <- summarised_growths %>%
  mutate(local_government_area = gsub(
    pattern = codes_to_remove,
    replacement = "",
    x = local_government_area
  ))
terminal_coverage <- vaccination_data_converted %>%
  left_join(summarised_growths, on = "local_government_area")

Create a plot that shows evidence for or against the hypothesis that vaccination coverage and growth rate are correlated. Write a maximum four sentence executive summary of your findings, and compare with a partner in the class.

There are lots of ways to do this. The below is an advanced solution, which calculates the growth rate every day and then looks at the average.

growth_rate <- case_data_aggregated %>%
  filter(year(date) >= 2022) %>%
  group_by(local_government_area) %>%
  mutate(
    growth_rate = slider::slide_dbl(
      .x = num_cases,
      .f = function(v) {
        calculate_growth_rate(Ct0 = v[1], Ct1 = v[7], time = 7)
      },
      .before = 6,
      .after = 0
    )
  )

codes_to_remove <- c(
  " \\(C\\)",
  " \\(S\\)",
  " \\(RC\\)",
  " \\(B\\)"
) %>%
  paste0(collapse = "|")

growth_rate %>%
  summarise(mean_growth_rate = mean(growth_rate, na.rm = TRUE)) %>%
  mutate(local_government_area = gsub(
    pattern = codes_to_remove,
    replacement = "",
    x = local_government_area
  )) %>%
  left_join(vaccination_data_converted, by = "local_government_area") %>%
  ggplot(aes(x = mean_growth_rate, y = THIRD)) +
  geom_point() +
  labs(x = "Mean annual growth rate", y = "Third dose coverage") +
  theme_bw()

From this plot, there is clearly one outlier on the left. Otherwise, there is no clear trend between third dose coverage and growth rate. Perhaps this is our methodology, perhaps it is a real observation, more analysis would be required to tell.

An example executive summary could be:

Correlation between third dose coverage in a local government area and mean annual growth rate have been compared. There is a single LGA with an unusually low growth rate and third dose coverage. Otherwise, there is no clear association between these two quantities. This suggests that increasing third dose coverage may not further reduce growth rate.