STE_CODE_2021 M_Neg_Nil_income_15_19_yrs M_Neg_Nil_income_20_24_yrs
1 2 88386 21186
Lecturer: Kate Saunders
Department of Econometrics and Business Statistics
Realities of data collection
Collecting data on the entire population is normally too expensive or infeasible!
Therefore, we often can only collect data about a subset of the population.
What we’ll cover
Case study data is on the Australian Census!
We will:
Learn about what the Australian census is and how the data is collected
Learn what data on population demographics are collected
Learn how the census data is stored and how to access it
From a coding perspective
Learn about organising your data the tidy data way.
Learn to manipulate strings and a bit about regular expressions.
$
1 invested in the Census, $
6 of value is generated to the Australian economy.The ABS contacts households in a few different ways:
Then households complete the Census form, either submitting it online or sending it back in the mail.
ABS provides a range of supports and resources to help everyone to fill in the census.
Break out discussion
Take a moment to think about what challenges might arise if you try to survey everyone.
Hint: Think about smaller communities, their sub-groups and their different needs.
It is no small task!
Resources for people in the deaf/hard of hearing and blind/low vision communities
e.g. audio guides and braille information packs
To support Aboriginal and Torres Straight Islanders to fill in the census there are urban and regional pop-up hubs. This includes extra face-to-face support
For migrants, refugees, and international visitors there are language supports available.
Additional efforts are made to survey in locations to reach without a fixed address
e.g. FIFO workers (Fly in Fly Out), Grey Nomads, People experiencing homelessness.
There are questions about:
Breakout Session
Investigate what data is collected in the census.
Use the quick stats summary for Clayton here.
Are there any weird variables, or variables that surprise you? What do you learn about where you live?
https://www.abs.gov.au/census/find-census-data
Data
There are two main types of data that you can download:
DataPacks are only available for the 2011, 2016 and 2021 census.
ABS aims for census data to be comparable and compatible with previous censuses.
Questions and classifications are reviewed to reflect changes in the Australian society.
e.g. In 2021, ABS did not to ask about home internet connection as people now have other options like mobile devices and that data was no longer considered relevant to society.
There are small differences in the available data between years.
Variables can be added, updated and removed.
There are also sometimes data corrections at a later date.
Here are links to:
(i) what’s new in 2021 - there were 56 new additions!
(ii) consultation for changes in 2026 and
(iii) an example of a 2026 proposed change
Detective Work
Much like real detective work, just locating the data and understanding the data variables can take a long time
Cleaning and wrangling of the data is not glamorous;
There’s far more attention in “catching criminals” / praise for the cool discoveries from statistical analysis.
Let’s get delve into ‘grunt work’ of an analysis with the census data!
The data is nested within folders.
Click on the folder name to see folders and files nested within.
Preserve the data in the original structure as much as you can!
Good practice not to modify the raw data and it’s structure
Download the 2021 Census data containing the General Community Profile for all geographies in Victoria.
Before we jump in, we need some description or understanding of the variables.
It will be near impossible to extract meaningful information from the data without it.
Breakout Session
Then take some time to review the read me and the meta data folders.
Which folder contains demographic information about each suburb?
What is LGA short for?
Where can I find information about how much rent people pay?
What is contained in variable G17?
A few things to note:
There are 201 columns in G17A and G17B and 81 columns in G17C.
Perhaps there is an export limitation for a data that contains more than 200 columns, thus it is broken up into different csv files.
Which means that you have to join the tables G17A, G17B and G17C as one
(you’ll do this in the tutorial ).
Question
But what does the data look like when you open the file?
2021Census_G17A_VIC_STE.csv
STE_CODE_2021 M_Neg_Nil_income_15_19_yrs M_Neg_Nil_income_20_24_yrs
1 2 88386 21186
2021Census_G17B_VIC_STE.csv
STE_CODE_2021 F_300_399_15_19_yrs F_300_399_20_24_yrs
1 2 8810 19537
2021Census_G17C_VIC_STE.csv
STE_CODE_2021 P_650_799_15_19_yrs P_650_799_20_24_yrs
1 2 7670 45029
Tidy Data Principles 1
So what about the ABS Census Data?
F_400_499_15_19_yrs
is female aged 15-19 years old who earn $400-499 per week (in Victoria). age_min age_max gender income_min income_max count
1 15 19 female 400 499 4020
Putting data into a tidy format makes the data analysis easier.
You can include other information, e.g. geography code (useful if combining with other geographical area) or average age/income.
Note some categories do not have upper bounds, e.g. M_3000_more_85ov
. In R, -Inf
and Inf
are used to represent \(-\infty\) and \(\infty\), respectively.
You’ll wrangle the data into the tidy form in tutorial
This will require getting the pieces of information from the column names and organising them using string manipulation.
stringr
1 package provides a set of functions designed to help with string manipulation.Main functions in stringr
begin with the prefix with str_
and the first input into the functions is a string (or a vector of strings)
What do you think str_trim
and str_squish
do?
stringr
functions.These are stringr
functions we’ll need for our census application.
Splitting strings by a pattern:
[[1]]
[1] "Hi" "everyone" "in" "ETC5512"
Replacing parts of strings with a different pattern:
Deleting parts of strings that aren’t imporant:
[1] "we_want_stuff"
To get more control over the kinds of patterns we can match, we need regular expressions.
Basic match
Meta-characters
"."
a wildcard to match any character except a new line"(.|.)"
a marked subexpression with alternate possibilites marked with |
Meta-character quantifiers
"?"
zero or one occurence of preceding element"*"
zero or more occurence of preceding element"{n}"
preceding element is matched exactly n
times[1] "-" "-na" "bana" "-nana"
"{min,}"
preceding element is matched min
times or moreCharacter classes
[:alpha:]
or [A-Za-z]
to match alphabetic characters[:alnum:]
or [A-Za-z0-9]
to match alphanumeric characters[:digit:]
or [0-9]
or \\d
to match a digit[^0-9]
to match non-digits[a-c]
to match a, b or c[A-Z]
to match uppercase letters[a-z]
to match lowercase letters[:space:]
or [ \t\r\n\v\f]
to match whitespace characters[1] │ <banana>
[2] │ <banana>na
[3] │ <bana>
[4] │ <bana><banana>
[1] │ <banana>
[2] │ <banana>na
[3] │ <bana>
[4] │ <bana><banana>
Tip
stringr
ends with _all
, all matches of the pattern are considered_all
only considers the first matchCharacters we use to define the regex, e.g. *,.,!,?,),] need to be defined differently when we are trying to match them.
This doesn’t work:
[1] "L"
[1] │ <L><e><t>'<s> <g><e><t> <t><h><e> <c><h><a><r><a><c><t><e><r> <a><n><d> <t><h><e> <b><r><a><c><k><e><t><s> (<A>)
But this does.
To match a bracket (
we need to use \\(
in stringr. It tells R we are looking for the bracket as part of the pattern and not to look for the backslash. The same goes for other special characters:
2021_GCP_Sequential_Template_R2.xlsx
, Sheet “G17”, footnote says “Please note that there are small random adjustments made to all cell values to protect the confidentiality of data. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals.”Curious
Do you think that you’ll get the same numbers if you aggregate different geographical regions? E.g. SA1
and STE
.
Australian Census Case Study
We went through how to find and understand the data available in the 2021 Australian census.
Learnt about census data collection and data limitations.
Taste of detective work: Understanding the file structures and what the data contains.
Also learnt about tidy data
Covered the basics of string manipulation
Break out questions
Which folder contains demographic information about each suburb?
In the file 2021AboutDataPacks_readme.txt
you find out that folders represent different geographical sub-regions. SAL represents suburbs and locaties and in the previous census this was called SSC.
What is LGA short for?
Local Government Areas
Where can I find information about how much rent people pay?
In the file 2021_GCP_Sequential_Template_R2
there is a list of variables and what is contained in each table. G40 contains the rental information (organised by landlord type).
G17 contains information about the total personal income organised by age and sex.
[1] "Broken Hill (C)" "Waroona (S)" "Toowoomba (R)" "West Arthur (S)"
[5] "Moreton Bay (R)" "Etheridge (S)" "Cleve (DC)"
C = Cities | A = Areas | RC = Rural Cities |
B = Boroughs | S = Shires | DC = District Councils |
M = Municipalities | T = Towns | AC = Aboriginal Councils |
RegC = Regional Councils |
🎯 Extract the LGA status from the LGA names
[1] "(C)" "(S)" "(R)" "(S)" "(R)"
[6] "(S)" "(DC)" "(R)" "(DC)" "(C)"
[11] "(DC)" "(S)" "(S)" "(S)" "(DC)"
[16] "(A)" "(C)" "(A)" "(T)" "(RC)"
[21] "(A)" "(S)" "(S)" "(S)" "(C)"
[26] "(DC)" "(R)" "(A)" "(C)" "(DC)"
[31] "(S)" "(S)" "(A)" "(S)" "(S)"
[36] "(R)" "(M)" "(A)" "(C)" "(S)"
[41] "(S)" "(C)" "(A)" "(S)" "(C)"
[46] "(AC)" "(A)" "(S)" "(A)" "(C)"
[51] "(A)" "(R)" "(S)" "(T)" "(C)"
[56] "(S)" "(S)" "(R)" "(C)" "(T)"
[61] "(C)" "(S)" "(C)" "(C)" "(C)"
[66] "(C)" "(S)" "(DC)" "(DC)" "(S)"
[71] "(R)" "(R)" "(S)" "(B)" "(DC)"
[76] "(M)" "(A)" "(C)" "(S)" "(S)"
[81] "(S)" "(S)" "(S)" "(S)" "(S)"
[86] "(C)" "(A)" "(C)" "(A)" "(S)"
[91] "(C)" "(A)" "(S)" "(S)" "(S)"
[96] "(S)" "(DC)" "(S)" "(S)" "(S)"
[101] "(C)" "(C)" "(DC)" "(S)" "(S)"
[106] "(C)" "(S)" "(DC)" "(C)" "(C)"
[111] "(S)" "(S)" "(S)" "(S)" "(S)"
[116] "(S)" "(A)" "(DC)" "(S)" "(A)"
[121] "(C)" "(A)" "(S)" "(A)" "(DC)"
[126] "(S)" "(C)" "(S)" "(A)" "(S)"
[131] "(M)" "(S)" "(DC)" "(R)" "(C)"
[136] "(C)" "(S)" "(C)" "(S)" "(T)"
[141] "(S)" "(S)" "(DC)" "(S)" "(T)"
[146] "(C)" "(S)" "(M)" "(S)" "(DC)"
[151] "(C)" "(S)" "(M)" "(C)" "(S)"
[156] "(C)" "(C)" "(R)" "(S)" "(C)"
[161] "(C)" "(R)" "(S)" "(C)" "(A)"
[166] "(T)" "(S)" "(RC)" "(C)" "(A)"
[171] "(A)" "(A)" "(S)" "(A)" "(S)"
[176] "(S)" "(T)" "(S)" "(S)" "(S)"
[181] "(A)" "(DC)" "(M)" "(C)" "(S)"
[186] "(A)" "(T)" "(A)" "(C)" "(S)"
[191] "(C)" "(R)" "(C)" "(S)" "(S)"
[196] "(S)" "(S)" "(R)" "(C)" "(DC)"
[201] "(A)" "(DC)" "(R)" "(C)" "(S)"
[206] "(S)" "(C)" "(C)" "(R)" "(S)"
[211] "(S)" "(C)" "(A)" "(S)" "(S)"
[216] "(C)" "(DC)" "(S)" "(M) (Tas.)" "(M) (Tas.)"
[221] "(C) (Vic.)" "(C) (Vic.)" "(S)" "(DC)" "(S)"
[226] "(RC)" "(S)" "(DC)" "(S)" "(S)"
[231] "(R)" "(S)" "(A)" "(C)" "(C)"
[236] "(A)" "(A)" "(RC)" "(S)" "(C)"
[241] "(S)" "(S)" "(S)" "(C)" "(C)"
[246] "(S)" "(C)" "(C)" "(C)" "(A)"
[251] "(C)" "(S)" "(S)" "(S)" "(S)"
[256] "(S)" "(A)" "(A)" "(A)" "(S)"
[261] "(A)" "(A)" "(S)" "(S)" "(C)"
[266] "(A)" "(M)" "(S)" "(S)" "(C)"
[271] "(R)" "(S)" "(R)" "(DC)" "(R)"
[276] "(C)" "(S)" "(S)" "(C)" "(S)"
[281] "(A)" "(R)" "(DC)" "(A)" "(C)"
[286] "(A)" "(S)" "(S)" "(A)" "(C)"
[291] "(C)" "(A)" "(T)" "(S)" "(C)"
[296] "(A)" "(A)" "(S)" "(S)" "(T)"
[301] "(C)" "(A)" "(A)" "(DC)" "(A)"
[306] "(C)" "(M)" "(M)" "(S)" "(A)"
[311] "(A)" "(C)" "(C)" "(S)" "(DC)"
[316] "(S)" "(C)" "(S)" "(S)" "(DC)"
[321] "(RegC)" "(C)" "(S)" "(S)" NA
[326] "(A)" "(S)" "(A)" "(S)" "(A)"
[331] "(S)" "(C)" "(R)" "(C)" "(S)"
[336] "(A)" "(DC)" "(S)" "(A)" "(R)"
[341] "(S)" "(S)" "(RC)" "(T)" "(A)"
[346] "(M)" "(A)" "(S)" "(S)" "(S)"
[351] "(S)" "(A)" "(RC)" "(S)" "(A)"
[356] "(R)" "(S)" "(S)" "(C)" "(S)"
[361] "(DC)" "(M)" "(M)" "(AC)" "(DC)"
[366] "(A)" "(A)" "(S)" "(S)" "(A)"
[371] "(C)" "(S)" "(S)" "(C)" "(R)"
[376] "(S)" "(S)" NA "(A)" "(T)"
[381] "(S)" "(A)" "(C)" "(C)" "(A)"
[386] "(C)" "(DC)" "(C)" "(A)" "(A)"
[391] "(A)" "(S)" "(DC)" "(DC)" "(S)"
[396] "(M)" "(R)" "(DC)" "(C)" "(S)"
[401] "(S)" "(C)" "(C)" "(C)" "(C)"
[406] "(C)" "(S)" "(A)" NA "(S)"
[411] "(C)" "(S)" "(M)" "(C)" "(S)"
[416] "(S)" NA "(C)" "(S)" "(C)"
[421] "(DC)" "(S)" "(C)" "(S)" "(C)"
[426] "(M)" "(A)" "(A)" "(A)" "(S)"
[431] "(C)" "(S)" "(S)" "(S)" "(A)"
[436] "(A)" "(A)" "(S)" "(S)" "(S)"
[441] "(C)" "(S)" "(C)" "(C)" "(C)"
[446] "(C) (NSW)" "(S) (Qld)" "(R) (Qld)" "(DC) (SA)" "(C) (SA)"
[451] "(M) (Tas.)" "(M) (Tas.)" "(C)" "(R)" "(M)"
[456] "(C)" "(R)" "(S)" "(RC)" "(S)"
[461] "(M)" "(C)" "(R)" "(C)" "(DC)"
[466] "(C)" "(C)" "(M)" "(C)" "(S)"
[471] "(C)" "(DC)" "(M)" "(S)" "(C)"
[476] "(C)" "(A)" "(DC)" "(R)" "(C)"
[481] "(C)" "(A)" "(M)" "(C)" "(C)"
[486] "(S)" "(S)" "(S)" "(A)" "(R)"
[491] "(M)" "(A)" "(R)" "(A)" "(A)"
[496] "(R)" "(R)" "(R)" "(S)" "(C)"
[501] "(C)" "(S)" "(A)" "(S)" "(M)"
[506] "(M)" "(S)" "(A)" "(A)" "(S)"
[511] "(A)" "(C)" "(DC)" "(S)" "(S)"
[516] NA "(A)" NA "(R)" "(C)"
[521] "(S)" "(C)" "(S)" "(A)" "(A)"
[526] "(A)" "(A)" "(C)" "(A)" "(A)"
[531] "(A)" "(A)" "(C) (NSW)" "(A)" "(C)"
[536] "(R)" "(S)" "(A)" "(R)" "(C)"
[541] "(A)" "(S)" "(A)" "(A)"
Important
"\\(.+\\)"
???\
when \
is included in the pattern _(yes this means that you can have a lot of backslashes… just keep adding \
until it works! Enjoy this xkcd comic\
, e.g. (\(.+\)
is the same as \\(.+\\)
[1] "(C)" "(S)" "(R)" "(S)" "(R)"
[6] "(S)" "(DC)" "(R)" "(DC)" "(C)"
[11] "(DC)" "(S)" "(S)" "(S)" "(DC)"
[16] "(A)" "(C)" "(A)" "(T)" "(RC)"
[21] "(A)" "(S)" "(S)" "(S)" "(C)"
[26] "(DC)" "(R)" "(A)" "(C)" "(DC)"
[31] "(S)" "(S)" "(A)" "(S)" "(S)"
[36] "(R)" "(M)" "(A)" "(C)" "(S)"
[41] "(S)" "(C)" "(A)" "(S)" "(C)"
[46] "(AC)" "(A)" "(S)" "(A)" "(C)"
[51] "(A)" "(R)" "(S)" "(T)" "(C)"
[56] "(S)" "(S)" "(R)" "(C)" "(T)"
[61] "(C)" "(S)" "(C)" "(C)" "(C)"
[66] "(C)" "(S)" "(DC)" "(DC)" "(S)"
[71] "(R)" "(R)" "(S)" "(B)" "(DC)"
[76] "(M)" "(A)" "(C)" "(S)" "(S)"
[81] "(S)" "(S)" "(S)" "(S)" "(S)"
[86] "(C)" "(A)" "(C)" "(A)" "(S)"
[91] "(C)" "(A)" "(S)" "(S)" "(S)"
[96] "(S)" "(DC)" "(S)" "(S)" "(S)"
[101] "(C)" "(C)" "(DC)" "(S)" "(S)"
[106] "(C)" "(S)" "(DC)" "(C)" "(C)"
[111] "(S)" "(S)" "(S)" "(S)" "(S)"
[116] "(S)" "(A)" "(DC)" "(S)" "(A)"
[121] "(C)" "(A)" "(S)" "(A)" "(DC)"
[126] "(S)" "(C)" "(S)" "(A)" "(S)"
[131] "(M)" "(S)" "(DC)" "(R)" "(C)"
[136] "(C)" "(S)" "(C)" "(S)" "(T)"
[141] "(S)" "(S)" "(DC)" "(S)" "(T)"
[146] "(C)" "(S)" "(M)" "(S)" "(DC)"
[151] "(C)" "(S)" "(M)" "(C)" "(S)"
[156] "(C)" "(C)" "(R)" "(S)" "(C)"
[161] "(C)" "(R)" "(S)" "(C)" "(A)"
[166] "(T)" "(S)" "(RC)" "(C)" "(A)"
[171] "(A)" "(A)" "(S)" "(A)" "(S)"
[176] "(S)" "(T)" "(S)" "(S)" "(S)"
[181] "(A)" "(DC)" "(M)" "(C)" "(S)"
[186] "(A)" "(T)" "(A)" "(C)" "(S)"
[191] "(C)" "(R)" "(C)" "(S)" "(S)"
[196] "(S)" "(S)" "(R)" "(C)" "(DC)"
[201] "(A)" "(DC)" "(R)" "(C)" "(S)"
[206] "(S)" "(C)" "(C)" "(R)" "(S)"
[211] "(S)" "(C)" "(A)" "(S)" "(S)"
[216] "(C)" "(DC)" "(S)" "(M) (Tas.)" "(M) (Tas.)"
[221] "(C) (Vic.)" "(C) (Vic.)" "(S)" "(DC)" "(S)"
[226] "(RC)" "(S)" "(DC)" "(S)" "(S)"
[231] "(R)" "(S)" "(A)" "(C)" "(C)"
[236] "(A)" "(A)" "(RC)" "(S)" "(C)"
[241] "(S)" "(S)" "(S)" "(C)" "(C)"
[246] "(S)" "(C)" "(C)" "(C)" "(A)"
[251] "(C)" "(S)" "(S)" "(S)" "(S)"
[256] "(S)" "(A)" "(A)" "(A)" "(S)"
[261] "(A)" "(A)" "(S)" "(S)" "(C)"
[266] "(A)" "(M)" "(S)" "(S)" "(C)"
[271] "(R)" "(S)" "(R)" "(DC)" "(R)"
[276] "(C)" "(S)" "(S)" "(C)" "(S)"
[281] "(A)" "(R)" "(DC)" "(A)" "(C)"
[286] "(A)" "(S)" "(S)" "(A)" "(C)"
[291] "(C)" "(A)" "(T)" "(S)" "(C)"
[296] "(A)" "(A)" "(S)" "(S)" "(T)"
[301] "(C)" "(A)" "(A)" "(DC)" "(A)"
[306] "(C)" "(M)" "(M)" "(S)" "(A)"
[311] "(A)" "(C)" "(C)" "(S)" "(DC)"
[316] "(S)" "(C)" "(S)" "(S)" "(DC)"
[321] "(RegC)" "(C)" "(S)" "(S)" NA
[326] "(A)" "(S)" "(A)" "(S)" "(A)"
[331] "(S)" "(C)" "(R)" "(C)" "(S)"
[336] "(A)" "(DC)" "(S)" "(A)" "(R)"
[341] "(S)" "(S)" "(RC)" "(T)" "(A)"
[346] "(M)" "(A)" "(S)" "(S)" "(S)"
[351] "(S)" "(A)" "(RC)" "(S)" "(A)"
[356] "(R)" "(S)" "(S)" "(C)" "(S)"
[361] "(DC)" "(M)" "(M)" "(AC)" "(DC)"
[366] "(A)" "(A)" "(S)" "(S)" "(A)"
[371] "(C)" "(S)" "(S)" "(C)" "(R)"
[376] "(S)" "(S)" NA "(A)" "(T)"
[381] "(S)" "(A)" "(C)" "(C)" "(A)"
[386] "(C)" "(DC)" "(C)" "(A)" "(A)"
[391] "(A)" "(S)" "(DC)" "(DC)" "(S)"
[396] "(M)" "(R)" "(DC)" "(C)" "(S)"
[401] "(S)" "(C)" "(C)" "(C)" "(C)"
[406] "(C)" "(S)" "(A)" NA "(S)"
[411] "(C)" "(S)" "(M)" "(C)" "(S)"
[416] "(S)" NA "(C)" "(S)" "(C)"
[421] "(DC)" "(S)" "(C)" "(S)" "(C)"
[426] "(M)" "(A)" "(A)" "(A)" "(S)"
[431] "(C)" "(S)" "(S)" "(S)" "(A)"
[436] "(A)" "(A)" "(S)" "(S)" "(S)"
[441] "(C)" "(S)" "(C)" "(C)" "(C)"
[446] "(C) (NSW)" "(S) (Qld)" "(R) (Qld)" "(DC) (SA)" "(C) (SA)"
[451] "(M) (Tas.)" "(M) (Tas.)" "(C)" "(R)" "(M)"
[456] "(C)" "(R)" "(S)" "(RC)" "(S)"
[461] "(M)" "(C)" "(R)" "(C)" "(DC)"
[466] "(C)" "(C)" "(M)" "(C)" "(S)"
[471] "(C)" "(DC)" "(M)" "(S)" "(C)"
[476] "(C)" "(A)" "(DC)" "(R)" "(C)"
[481] "(C)" "(A)" "(M)" "(C)" "(C)"
[486] "(S)" "(S)" "(S)" "(A)" "(R)"
[491] "(M)" "(A)" "(R)" "(A)" "(A)"
[496] "(R)" "(R)" "(R)" "(S)" "(C)"
[501] "(C)" "(S)" "(A)" "(S)" "(M)"
[506] "(M)" "(S)" "(A)" "(A)" "(S)"
[511] "(A)" "(C)" "(DC)" "(S)" "(S)"
[516] NA "(A)" NA "(R)" "(C)"
[521] "(S)" "(C)" "(S)" "(A)" "(A)"
[526] "(A)" "(A)" "(C)" "(A)" "(A)"
[531] "(A)" "(A)" "(C) (NSW)" "(A)" "(C)"
[536] "(R)" "(S)" "(A)" "(R)" "(C)"
[541] "(A)" "(S)" "(A)" "(A)"
(A) (AC) (B) (C) (C) (NSW) (C) (SA) (C) (Vic.)
100 2 1 120 2 1 2
(DC) (DC) (SA) (M) (M) (Tas.) (R) (R) (Qld) (RC)
40 1 23 4 38 1 7
(RegC) (S) (S) (Qld) (T)
1 182 1 12
Where the same Local Government Area name appears in different States or Territories, the State or Territory abbreviation appears in parenthesis after the name. Local Government Area names are therefore unique.
-Australian Bureau of Statistics
A AC B C DC M R RC RegC S T
100 2 1 125 41 27 39 7 1 183 12
"[]"
for single character match(
and )
but these are meta-characters\(
and \)
\\(
\\)
str_extract(LGA, r"(\([^)]+\))") %>%
# remove the brackets
str_replace_all(r"([\(\)])", "") %>%
table()
## .
## A AC B C DC M R RC RegC S T
## 100 2 1 125 41 27 39 7 1 183 12
ETC5512