Office of Airline Information, Bureau of Transportation Statistics
Reporting carriers are required to (or voluntarily) report on-time data for flights they operate
This is like a census for all ‘US certified air carriers that account for at least one percent of domestic scheduled passenger revenues’, containing information on each commercial flight that carried passengers.
Yes, this is open data. There is no obvious license, but it falls under the US open data policy guidelines. Check the Web policies link at bottom.
One row is one operated flight. You can think of this as event data. Ususally you will want to aggregate it in different ways for analysis.
## FL_DATE
## Length:538837
## Class :character
## Mode :character
## # A tibble: 15 × 2
## OP_UNIQUE_CARRIER n
## <chr> <int>
## 1 WN 112430
## 2 DL 75174
## 3 AA 74999
## 4 UA 56657
## 5 OO 50347
## 6 YX 24476
## 7 B6 23249
## 8 NK 21876
## 9 AS 19801
## 10 MQ 18849
## 11 9E 16926
## 12 OH 15456
## 13 F9 13285
## 14 G4 8615
## 15 HA 6697
Helpful hint: If you are having difficulty running this code check the file name matches, check your file path and check your project directory.
WN has the largest number of flights. This is the low cost carrier Southwest.
## # A tibble: 351 × 2
## ORIGIN n
## <chr> <int>
## 1 ATL 32190
## 2 ORD 25661
## 3 DFW 24339
## 4 DEN 20398
## 5 CLT 19995
## 6 LAX 17799
## 7 PHX 15325
## 8 IAH 14792
## 9 LAS 14186
## 10 LGA 13836
## # … with 341 more rows
The number of outgoing flights is the same (or very close to) as the number of incoming flights, as we would expect. ATL, which is Atlanta, Georgia is the busiest airport.
R0000100
,
R0173600
, R0214700
, R0214800
.
Read the codebook to find out what these are.R0000100 is a unique id for each individual, R0173600 is a sex and race survey question with values from 1-20, R0214700 is race with 3 categories, R0214800 is sex with only two categories.
Only your email address.
https://data.gov/privacy-policy.html#license says “U.S. Federal data available through Data.gov is offered free and without restriction.”
Observational data collected by survey sampling: “a cross-sectional sample of 6,111 respondents designed to represent the noninstitutionalized civilian segment of people living in the United States in 1979 and born between January 1, 1957, and December 31, 1964 (ages 14-21 as of December 31, 1978)”
## # A tibble: 3 × 2
## R0214700 n
## <dbl> <int>
## 1 1 2002
## 2 2 3174
## 3 3 7510
## # A tibble: 2 × 2
## R0214800 n
## <dbl> <int>
## 1 1 6403
## 2 2 6283
## # A tibble: 20 × 2
## R0173600 n
## <dbl> <int>
## 1 1 2236
## 2 2 203
## 3 3 346
## 4 4 218
## 5 5 2279
## 6 6 198
## 7 7 405
## 8 8 226
## 9 9 742
## 10 10 1105
## 11 11 729
## 12 12 901
## 13 13 1067
## 14 14 751
## 15 15 609
## 16 16 162
## 17 17 53
## 18 18 342
## 19 19 89
## 20 20 25
## # A tibble: 1 × 1
## n
## <int>
## 1 12686
Yes, creative commons.
galah
library, and the function
occurrences
extract the records for platypus.Helpful tip: To run the code to download the data you must register using your email address on the ALA webpage first
Platypus are mostly found along the ast coast of Australia, and also Tasmania.
1770 through to 2022
How is this data collected? Explain the ways that a platpus sighting would be added to the database. Also think about what might be missing from the data?
This data is mostly provided on a voluntary basis, some by researchers, some by citizen scientists. This means that it is not systematic, so there may be locations where platypus are found that people don’t go. These places would then not be represented in the database.