Everything needed to complete the assignment was in the tutorial-03.zip provided. Here are possibly some of the expected things you will find when running the code.

🧮 Exercise 2

Explore your data! These tasks should be done using the dplyr interface, so that the tidy wrangling verbs can be used instead of raw SQL functions.

  1. Find the carrier that had the most flights during the month. Determine which carrier this is?

WN = Southwest had the most flights.

  1. Which airport had the most departing traffic?

LAX = Los Angeles had the most departing flights.

  1. Compute the smallest, largest and median departure delay for the busiest airport. What would it mean if the median departure delay was negative?

LAX is the busiest airport. The smallest departure delay for the month was much less than 0, which means the flight left early, quite early. The longest delay was a day later. (This looks like a time zone calculation error.) The median delay was is less than 0. That means that 50% of the flights left before scheduled.

  1. Make a side-by-side boxplot or violin plot of the delays for each carrier, at the busiest airport.
    1. Think about transforming delay because it has a skewed distribution. (If you use a transformation on the axis, check the number of missings. It may be that a lot of data is excluded and you need to do the transformation with mutate.)
    2. Sort the carrier axis by the median delay (this is tricky! Hint: use the forcats package).
    3. Make nice labels on the axis
    4. Write a paragraph on what is learned about the delays by carrier

Overall, there is not much difference in the median delays, and the variation in delays between carriers. American Airlines (AA) and Southwest (WN) have the highest median delay and Frontier Airlines (F9) has the lowest median delay. Skywest (OO) has the flight with the longest delay.

  1. How many records, of the busiest airport, have missing values for departure delay?

📍 Exercise 3

Here we are going to add a new table with airport information, and use this to make a map of flights.

  1. Read the airport location data into R, and add a table to your database.

  2. Plot the locations on a map. You should filter the airports to only the latest location. Airports sometimes move 🤭. An Open Street Map can be downloaded using the get_map() function in the ggmap package.

  3. Now the fun part, lets take a day’s worth of flights, and plot all the flights. You will need to join the day of flights data with the airport locations, using both the ORIGIN and DESTination.

  4. Choose the two major carriers for your day of data, and make two separate maps of flights, one for each carrier. Compare and contrast the carrier flight patterns.

I chose Delta and Southwest. It looks a little like Delta has more of a hub system. We see this because both airlines are high volume carriers, but spatially it looks like Southwest has more flights than Delta. Delta flights are operating between fewer airports, and Southwest is more distributed, serving many more airports.

  1. ADVANCED: Now we are going to examine change in patterns over the course of a day. You will need to convert departure time into a standard time. Then break it into one of four categories: midnight-6am, 6am-noon, noon-6pm, 6pm-midnight. Using all the carriers again, make separate maps for each quarter of the day. Compare the traffic over these four time blocks.

There’s not a lot to see in four big groups like this. Its an exercise in working with time. And also in ordering the four groups appropriately.

  1. ADVANCED: Use the standardised times to follow the path of one plane during the day.

Plane N243WN have 8 flights during the day, but it goes back and forth between three airports.

Material developed by Prof Di Cook and maintained by Dr Kate Saunders