## `r emo::ji("target")` Objectives In this tutorial, you will learn - how to create, and interact, with a database. - work with tidyverse wrangling verbs to access database - add tables to an existing database ## `r emo::ji("wrench")` Preparation 1. The zip file provided with this tutorial has flights and airports data taken from https://transtats.bts.gov. You don't need to do this yourself, for this tutorial, but it would be useful for you to refresh your memory on downloading a month of data, as you did in tutorial 1. Also to download the airport data you need to *Click on "Aviation" on the left list, and then "Aviation Support Tables" followed by "Download" at bottom right. This has geographic coordinates for all the airports.* 2. Ensure that you have these packages (suites of packages) installed on your computer: `DBI`, `RSQLite`, `tidyverse`, `patchwork`, `ggmap`, `ggthemes`, `forcats`, `lubridate`, `lutz`. 3. Unzip the [tutorial-03.zip](https://wcd.numbat.space/tutorials/tutorial-03.zip) file provided, and notice the file structure for this week looks like: ``` etc5512-week03 ├── exercise.Rmd ├── data │   ├── flights.csv │   ├── airports.csv └── etc5512-week03.Rproj ``` NOTE: All the code to complete these exercises is in the `exercise.Rmd` in the zip file. ## `r emo::ji("minidisc")` Exercise 1 Build an sqlite database from the month of data. a. Read the data into R. It's possible to do this, because this is a small amount of data. b. Create an `sqlite` database called "sqlflights" from the data. c. Save the data we read in from the .csv file into the database as "flights". d. Practice reading the flights data back from the database using the functions `tbl()` and `collect()`. e. Do a simple query, collecting all the records for the first day of the month. ## `r emo::ji("abacus")` Exercise 2 Explore your data! These tasks can be done using the dplyr interface, so that the **tidy wrangling verbs** can be used instead of raw SQL functions. a. Find the carrier that had the most flights during the month. Determine which carrier this is? b. Which airport had the most departing traffic? c. Compute the smallest, largest and median departure delay for the busiest airport. What would it mean if the median departure delay was negative? d. Make a side-by-side boxplot of the delays for each carrier, at the busiest airport. i. Think about transforming delay because it has a skewed distribution. (If you use a transformation on the axis, check the number of missings. It may be that a lof ot data is excluded and you need to do the transformation with `mutate`.) ii. Sort the carrier axis by the median delay (this is tricky! Hint: use the `forcats` package). iii. Make nice labels on the axis iv. Write a paragraph on what is learned about the delays by carrier. e. How many records, of the busiest airport, have missing values for departure delay? ## `r emo::ji("round_pushpin")` Exercise 3 Here we are going to add a new table with airport information, and use this to make a map of flights. a. Read the airport location data into R, and add a table to your database. b. Plot the locations on a map. You should filter the airports to only the latest location. Airports sometimes move `r emo::ji("hand_over_mouth")`. An Open Street Map can be downloaded using the `get_map()` function in the `ggmap` package. c. Now the fun part, lets take a day's worth of flights, and plot all the flights. You will need to join the day of flights data with the airport locations, using both the origin and destination. d. Choose the two major carriers for your day of data, and make two separate maps of flights, one for each carrier. Compare and contrast the carrier flight patterns. e. ADVANCED: Now we are going to examine change in patterns over the course of a day. You will need to convert departure time into a standard time. Then break it into one of four categories: midnight-6am, 6am-noon, noon-6pm, 6pm-midnight. Using all the carriers again, make separate maps for each quarter of the day. Compare the traffic over these four time blocks. f. ADVANCED: Use the standardised times to follow the path of one plane during the day. #### Material developed originally by Prof Di Cook and maintained by Dr Kate Saunders #### Copyright Monash University 2024