In this tutorial, you will learn
DBI
, RSQLite
,
tidyverse
, patchwork
, ggmap
,
ggthemes
, forcats
, lubridate
,
lutz
.etc5512-week03
├── exercise.Rmd
├── data
│ ├── flights.csv
│ ├── airports.csv
└── etc5512-week03.Rproj
NOTE: All the code to complete these exercises is in the
exercise.Rmd
in the zip file.
Build an sqlite database from the month of data.
Read the data into R. It’s possible to do this, because this is a small amount of data.
Create an sqlite
database called “sqlflights” from
the data.
Save the data we read in from the .csv file into the database as “flights”.
Practice reading the flights data back from the database using
the functions tbl()
and collect()
.
Do a simple query, collecting all the records for the first day of the month.
Explore your data! These tasks can be done using the dplyr interface, so that the tidy wrangling verbs can be used instead of raw SQL functions.
Find the carrier that had the most flights during the month. Determine which carrier this is?
Which airport had the most departing traffic?
Compute the smallest, largest and median departure delay for the busiest airport. What would it mean if the median departure delay was negative?
Make a side-by-side boxplot of the delays for each carrier, at the busiest airport.
mutate
.)forcats
package).How many records, of the busiest airport, have missing values for departure delay?
Here we are going to add a new table with airport information, and use this to make a map of flights.
Read the airport location data into R, and add a table to your database.
Plot the locations on a map. You should filter the airports to
only the latest location. Airports sometimes move 🤭. An Open Street Map
can be downloaded using the get_map()
function in the
ggmap
package.
Now the fun part, lets take a day’s worth of flights, and plot all the flights. You will need to join the day of flights data with the airport locations, using both the origin and destination.
Choose the two major carriers for your day of data, and make two separate maps of flights, one for each carrier. Compare and contrast the carrier flight patterns.
ADVANCED: Now we are going to examine change in patterns over the course of a day. You will need to convert departure time into a standard time. Then break it into one of four categories: midnight-6am, 6am-noon, noon-6pm, 6pm-midnight. Using all the carriers again, make separate maps for each quarter of the day. Compare the traffic over these four time blocks.
ADVANCED: Use the standardised times to follow the path of one plane during the day.