library(tidyverse)
library(sf)
SA1map <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"), layer = "census2016_eiuwa_vic_sa1_short")Week 6 Tutorial
Exercise 1
Read in the data from different sources
- Import the 2016 GeoPackage data with SA1 regions.
- Calculate the centroids for each SA1 region.
SA1map <- SA1map |>
mutate(centroid = st_centroid(geom))- Read in the data containing the electoral boundaries.
vic_map <- read_sf(here::here("data/vic-july-2018-esri/E_AUGFN3_region.shp")) |>
# to match up with election data
mutate(DivisionNm = toupper(Elect_div))Exercise 2
Integrate in the data from different sources
- Plot the median weekly personal income data on top of the electoral boundaries like below. Can you find which regions have wealthy individuals?
The inner city noticeably has regions with higher median weekly personal income.
ggplot() +
geom_sf(data = SA1map, aes(geometry = centroid, color = Median_tot_prsnl_inc_weekly), shape = 3) +
geom_sf(data = vic_map, aes(geometry = geometry), fill = "transparent", size = 1.3, color = "gray") +
coord_sf(xlim = c(144.8, 145.2), ylim = c(-38.1, -37.6)) +
scale_color_viridis_c(name = "Median weekly\nperson income", option = "magma") +
theme_minimal()
- Estimate a median weekly personal income for the Melbourne district.
melb_geometry <- vic_map |>
filter(DivisionNm == "MELBOURNE") |>
pull(geometry)
melb_SA1 <- SA1map |>
filter(st_intersects(centroid, melb_geometry, sparse = FALSE)[, 1]) |>
filter(Median_age_persons != 0)
fivenum(melb_SA1$Median_tot_prsnl_inc_weekly)
ggplot(melb_SA1, aes(x = Median_tot_prsnl_inc_weekly)) +
geom_histogram() +
theme_bw() +
xlab("Median of Total Personal Weekly Income") +
ylab("Count")For discussion
Looking at the map above, there is one electorate won by the Green party. Notice, this is where a lot of wealthy individuals live.
Can you say that those who vote for the Green party are rich individuals? Why or why not? Discuss with your classmates.
We cannot say that those who voted for Green are wealthy individuals from this data alone. There appears to be a large heterogeneity within the electorate (named Melbourne) that the Greens party won.Why do some geographical regions look sparse in terms of the census reported median weekly personal income?
Each SA1 region contains approximately the same number of people. Some geographical regions look sparse since the SA1 region is physically large, most notably in the rural areas.What is ecological fallacy? How does it relate to your conclusions from before?
An ecological fallacy is a mistaken statistical interpretation of data when inferences about the individuals are deduced from the group to which those individuals belong. For example, the Melbourne electorate has many wealthy individuals which make the median weekly personal income high in that electorate. It is wrong to conclude that a typical person selected in the Melbourne electorate will be wealthy.
Exercise 3: In your own time
Compare with different a layer
Repeat Exercise 2 using now the SED regions. How does the estimate of median weekly personal income for the Melbourne district differ? What about estimates in other districts?
SEDmap <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"),
layer = "census2016_eiuwa_vic_sed_short")Reminder the SED regions are NOT the same as the electoral boundaries.
SEDmap <- read_sf(here::here("data/Geopackage_2016_EIUWA_for_VIC/census2016_eiuwa_vic_short.gpkg"),
layer = "census2016_eiuwa_vic_sed_short"
) |>
mutate(centroid = st_centroid(geom))
melb_SED <- SEDmap |>
filter(st_intersects(centroid, melb_geometry, sparse = FALSE)[, 1])
fivenum(melb_SED$Median_tot_prsnl_inc_weekly)
ggplot(melb_SED, aes(x = Median_tot_prsnl_inc_weekly)) +
geom_histogram() +
theme_bw() +
xlab("Median of Total Personal Weekly Income") +
ylab("Count")
An estimate of the median total person weekly income is $947 using SA1 data and $816.5 using the SED data.