In this tutorial, you will learn
tidyverse
, here
and galah
. (You can copy and paste the code below into your RStudio Console window to do this.):install.packages(c("tidyverse", "here", "galah"))
Note that tidyverse
is a collection of packages, and installing it will install quite a lot of packages.
If you are new to R, you need to know that once you install a package its there for you to use in the future. You only need to install a package once, or whenever the package has been changed.
You do need to tell the R session to load the package each time you start R, though, using the library()
function.
You can think about this like the lightbulb analogy below.
This will bring you to an interface for choosing a subset.
⚠️ THE DATA IS VERY BIG SO FOLLOW THE INSTRUCTIONS BELOW TO DOWNLOAD A SMALL SUBSET
YEAR
, MONTH
, DAY_OF_MONTH
, DAY_OF_WEEK
, FL_DATE
, OP_UNIQUE_CARRIER
, TAIL_NUM
, ORIGIN
, DEST
, CRS_DEP_TIME
, DEP_TIME
, DEP_DELAY
,CRS_ARR_TIME
,ARR_TIME
, ARR_DELAY
library(tidyverse)
library(here)
flights <- read_csv(here::here("data/504717774_T_ONTIME_REPORTING.csv")) %>% select(YEAR:ARR_DELAY)
flights %>% count(YEAR)
flights %>% count(MONTH)
You can check the “Data profile” to help answer for these questions.
flights %>% select(FL_DATE) %>% summary()
flights %>% count(OP_UNIQUE_CARRIER, sort = TRUE)
flights %>% count(ORIGIN, sort = TRUE)
outgoing <- flights %>% count(ORIGIN) %>% rename(outbound = n)
incoming <- flights %>% count(DEST) %>% rename(inbound = n)
traffic <- full_join(outgoing, incoming, by=c("ORIGIN" = "DEST"))
ggplot(traffic, aes(x=outbound, y=inbound)) + geom_point() +
coord_equal()
❗ The following data contains information presenting sex as a binary variable, and also race as categorical. We realise that sex is not binary. And also that race is an “arbitrary system of visual clasification that does not demarcate distinct subspecies of the human population” (Mindy Thompson Fullilove). Please skip this question if it disturbs you.
R0000100
, R0173600
, R0214700
, R0214800
. Read the codebook to find out what these are.nlsy <- read_csv(here::here("data/NLSY/NLSY.csv"))
nlsy %>% count(R0214700)
nlsy %>% count(R0214800)
nlsy %>% count(R0173600)
nlsy %>% tally()
The Atlas of Living Australia is a major resource for occurrence data on animals, plants, insects, fish.
galah
library, and the function occurrences
extract the records for platypus. To download the data from this API you will need to register with your email first.library(galah)
galah_config(email = "YOUREMAILADDRESS",
download_reason_id = 10,
verbose = TRUE)
platypus <- galah_call() %>%
galah_identify("Ornithorhynchus anatinus") %>%
atlas_occurrences()
platypus <- platypus %>%
rename(Longitude = decimalLongitude,
Latitude = decimalLatitude) %>%
mutate(eventDate = as.Date(eventDate)) %>%
filter(!is.na(eventDate)) %>%
filter(!is.na(Longitude)) %>%
filter(!is.na(Latitude))
save(platypus, file=here::here("data/platypus.rda"))
load(here::here("data/platypus.rda"))
ggplot(platypus, aes(x=Longitude, y=Latitude)) +
geom_point()
platypus %>% select(eventDate) %>% summary()
How is this data collected? Explain the ways that a platpus sighting would be added to the database. Also think about what might be missing from the data?