Objectives

In this tutorial, you will learn

Exercise 2A: download data

Download education data from the U.S. Census Bureau using the link https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=. Complete the following tasks:

  1. Select Geographies; select a geographic type: County 050
  2. Select all the following states: Alabama, Kentucky, Missouri, South Carolina, Texas, West Virginia
  3. Add all counties within these states to your selection.
  4. Select Topics; People, Education, Educational Attainment
  5. Select 2010 and download data
  6. Put the downloaded files in your working directory and use read.csv to load the data in your workspace
library("tidyverse")
edu <- read.csv(???, header = T)
View(edu)

Exercise 2B: data collection

Answer the following questions for the downloaded data:

  1. Which sampling method is used in the survey data?
  2. How many interviews are used to construct the data for Alabama?
  3. What is the coverage rate for Texas, and what does this number mean?
  4. Which four primary sources of nonsampling error are addressed in constructing the data set?

Exercise 2C: data collection

Provide the following metadata for the downloaded data:

  1. Topic
  2. Source
  3. Temporal coverage
  4. Geographic coverage
  5. License

Exercise 2D: data analysis

Complete the following tasks:

  1. Select the variable “Total; Estimate; Percentage bachelor’s degree or higher” and the variable with county names.
  2. Delete the row with the column descriptions
  3. Delete observations with missing values
edu_clean <- edu %>%
  select(County = GEO.display.label, PercentBachelorOrHigher = ???) %>% 
  slice(-1) %>% 
  mutate(PercentBachelorOrHigher=
           as.numeric(levels(PercentBachelorOrHigher))[PercentBachelorOrHigher]) %>% 
  drop_na()
  1. Find the county with the largest percentage of people with bachelor’s degree or higher
  2. Find the county with the smallest percentage of people with bachelor’s degree or higher
  3. Report the mean and standard deviation over all counties for Percentage bachelor’s degree or higher
  4. Why did we look at the 2010 survey instead of the most recent data?

Exercise 2E: merge data

  1. Follow the same steps as in Exercise 2A, but now replace (Education and Educational Attainment) from the topics list with (Age & Sex and Age) to download the QT-P1 2010 SF1 100% Data.
  2. Follow the same steps as in Exercise 2D, but now replace “Total; Estimate; Percentage bachelor’s degree or higher” with “Percent - Both sexes; Total population - Under 18 years”.
  3. Merge the education data with the age data.
age <- read.csv(???, header = T)
age_clean <- age %>%
  select(County = GEO.display.label, Percentunder18 = ???) %>% 
  slice(-1) %>% 
  mutate(Percentunder18=
                         as.numeric(levels(Percentunder18))[Percentunder18]) %>% 
  drop_na()
edu_age <- inner_join(???,???,???)