Week 4 Tutorial

Tip

Sometimes two packages can contain the a function named the same thing. This can be confusing and create errors in your code. If you are worried about this, you can set which function you want to use as the default as shown below.

Exercise 1

[[1]]
[1] "F"   "300" "399" "55"  "64"  "yrs"

[[2]]
[1] "F"   "1"   "149" "35"  "44"  "yrs"

[[3]]
[1] "F"   "150" "299" "25"  "34"  "yrs"

[[4]]
[1] "M"   "800" "999" "75"  "84"  "yrs"

[[5]]
[1] "M"   "400" "499" "65"  "74"  "yrs"
[[1]]
[1] "F"   "300" "399" "55"  "64"  "yrs"

[[2]]
[1] "F"   "1"   "149" "35"  "44"  "yrs"

[[3]]
[1] "F"   "150" "299" "25"  "34"  "yrs"

[[4]]
[1] "M"   "800" "999" "75"  "84"  "yrs"

[[5]]
[1] "M"   "400" "499" "65"  "74"  "yrs"

[[6]]
[1] "F"   "1"   "149" "15"  "19"  "yrs"

[[7]]
[1] "F"    "150"  "299"  "85ov"

[[8]]
[1] "M"    "1500" "1749" "85ov"

[[9]]
[1] "M"      "Neg"    "Nil"    "income" "20"     "24"     "yrs"   

[[10]]
[1] "M"   "150" "299" "25"  "34"  "yrs"
[[1]]
[1] "F"   "300" "399" "55"  "64"  "yrs"

[[2]]
[1] "F"   "1"   "149" "35"  "44"  "yrs"

[[3]]
[1] "F"   "150" "299" "25"  "34"  "yrs"

[[4]]
[1] "M"   "800" "999" "75"  "84"  "yrs"

[[5]]
[1] "M"   "400" "499" "65"  "74"  "yrs"

[[6]]
[1] "F"   "1"   "149" "15"  "19"  "yrs"

[[7]]
[1] "F"   "150" "299" "85"  "110" "yrs"

[[8]]
[1] "M"    "1500" "1749" "85"   "110"  "yrs" 

[[9]]
[1] "M"    "-Inf" "0"    "20"   "24"   "yrs" 

[[10]]
[1] "M"   "150" "299" "25"  "34"  "yrs"
  sex income_min income_max age_min age_max
1   F        300        399      55      64
2   F          1        149      35      44
3   F        150        299      25      34
4   M        800        999      75      84
5   M        400        499      65      74
   sex income_min income_max age_min age_max
1    F        300        399      55      64
2    F          1        149      35      44
3    F        150        299      25      34
4    M        800        999      75      84
5    M        400        499      65      74
6    F          1        149      15      19
7    F        150        299      85     110
8    M       1500       1749      85     110
9    M       -Inf          0      20      24
10   M        150        299      25      34

Exercise 2

[1]   3 511
underscore_count_per_category
   2    3    4    5    6 
  18  252   90 1017  153 
underscore_count_per_category
   5 
1215 
  • There are 1 row and 201 columns.
  • We use the str_remove call to get rid of _yrs otherwise we would end up with an extra column we don’t need.

To repeat this for the SA1 regions, you just need to change the following line of code.

As you get more advanced in coding, you can learn to wrap all this code in a function so you don’t need to copy and paste the same code from above.

Exercise 3

  • If we use the STE data, we have 4.973795^{6} people over 15 years old but in SA1 data, we have 4.973795^{6}. The difference of 0 is 0, but you will find differences if you repeat this analysis for 2016.

It is actual quite common to find small differences between totals for different regions. This can likely attributed to the small random adjustments to the counts (for confidentiality). In particular, SA1 represents a smaller regions, so a bigger risk individuals could be identified. It is not surprising then that there will be more adjustments made to SA1 data.

The STE data is aggregated at state level so it would more accurately reflect the true number of people over 15 years old. This does not reflect the total population in Victoria, however, as it does not account for those under 15 years old. The population size by age in Victoria from 2021 census can be found here.

  • The minimum and maximum values of count is 66 and 1.63348^{5} (for STE, or for SA1 66 and 1.63348^{5}).

Exercise 4

  • Before drawing the boxplots, we’ll just wrangle the data to remove the redundant rows and make labels that are more pretty for the graph. You could also consider merging the 15-19 and 20-24 years old together so that the range is the same as other categories (except the one over 85 years old). The code and output are all shown below. There are a number of things you may notice from the graphs, such as, there are more females than males in almost all age groups in Victoria; higher income earners are still male dominant (even in younger age groups); females do appear to live longer.

  • Answering these questions was definitely easier because we made our data tidy!

Exercise 5

We will use the STE data to extract the relevant statistics.

According to the 2021 Census data:

  • There are 1.729024^{6} women in Victoria are aged between 15-54 years old.

  • The proportion of people in Victoria that are 25-34 years old (inclusive) and earn $1750 or more per week is 0.1146752.

  • If I randomly select a man from all the men aged 25-44 years old in Victoria, the probability that the man I selected earns less than $1500 per week is 0.2885572.