```{r setup, message = FALSE}
library(tidyverse)
library(sf)
vic_map <- read_sf(here::here("data/vic-july-2018-esri/E_AUGFN3_region.shp")) |>
# to match up with election data
mutate(DivisionNm = toupper(Elect_div))
SA1map <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"),
layer = "census2016_eiuwa_vic_sa1_short")
```
## 🏂 Exercise 6A
**Integrate data from different sources**
```{r}
SA1map <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"),
layer = "census2016_eiuwa_vic_sa1_short"
)
```
* The centroid for each SA1 region is calculated below. The calculation is done using the `st_centroid` function which takes in the geometry object and computes a coordinate (which we name as `x` and `y`). This coordinate is expanded so the `x` and `y` are its own column in the data frame.
```{r}
SA1map <- SA1map %>%
mutate(centroid = st_centroid(geom)) %>%
filter(Median_age_persons != 0)
```
* Plot is made as below. The city region noticeably has regions with higher median weekly personal income.
```{r, warning = FALSE}
ggplot(SA1map) +
geom_sf(aes(geometry = centroid, color = Median_tot_prsnl_inc_weekly), shape = 3) +
geom_sf(data = vic_map, aes(geometry = geometry), fill = "transparent", size = 1.3, color = "black") +
coord_sf(xlim = c(144.8, 145.2), ylim = c(-38.1, -37.6)) +
scale_color_viridis_c(name = "Median weekly\nperson income", option = "magma")
```
* We cannot say that those who voted for Green are wealthy individuals from this data alone. There appears to be a large heterogeneity within the electorate (named Melbourne) that the Greens party won.
* Each SA1 region contains approximately the same number of people. Some geographical regions look sparse since the SA1 region is physically large, most notably in the rural areas.
* An ecological fallacy is a mistaken statistical interpretation of data when inferences about the individuals are deduced from the group to which those individuals belong. For example, the Melbourne electorate has many wealthy individuals which make the median weekly personal income high in that electorate. It is wrong to conclude that a typical person selected in the Melbourne electorate will be wealthy.
```{r, message = FALSE, warning = FALSE, cache = FALSE, cache.path = "cache/"}
melb_geometry <- vic_map %>%
filter(DivisionNm == "MELBOURNE") %>%
pull(geometry)
MELB_SA1 <- SA1map %>%
filter(st_intersects(centroid, melb_geometry, sparse = FALSE)[, 1])
fivenum(MELB_SA1$Median_tot_prsnl_inc_weekly)
ggplot(MELB_SA1, aes(x = Median_tot_prsnl_inc_weekly)) +
geom_histogram()
```
## 🗾 Exercise 6B
**Compare with different a layer**
```{r}
SEDmap <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"),
layer = "census2016_eiuwa_vic_sed_short"
) %>%
mutate(centroid = st_centroid(geom))
MELB_SED <- SEDmap %>%
filter(st_intersects(centroid, melb_geometry, sparse = FALSE)[, 1])
fivenum(MELB_SED$Median_tot_prsnl_inc_weekly)
```
An estimate of the median total person weekly income is `r median(MELB_SA1$Median_tot_prsnl_inc_weekly)` dollars using SA1 data and `r median(MELB_SED$Median_tot_prsnl_inc_weekly)` dollars using the SED data.
#### Material maintained and updated by Dr. Kate Saunders. Material originally developed by Dr. Emi Tanaka