In this tutorial, you will learn to
Note: you should do preparation before the tutorial.
### Required packages
list.of.packages <- c("MASS", "data.table", "tidyverse", "here", "stringr",
"lubridate", "ggplot2", "usmap", "gganimate")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
invisible(lapply(list.of.packages, require, character.only = TRUE))
etc5512-week07
βββ tutorial-07.Rmd
βββ final data
β βββ 2016_2022_data.rds
βββ raw data
βββ prep
β βββ 00_read_data.R
β βββ 01_import_data.R
β βββ 02_combine.R
βββ etc5512-week07.Rproj
Note: This file has been pre-processed for you to work on the exercises in this tutorial.
This data contains the Fannie Mae Single Home Loan data from year 2016Q1 to 2022Q3.
alldata <- readRDS(here::here("final data", "2016_22_data.rds"))
Analyzing Data
Create a table of default rate (based on 60, 90, 120 and 180 days
of delinquency) by origination year and quarter. You might want to
convert the ORIG_DTE
to a date format first. Do you observe
something peculiar from the table? Explain why.
Draw a U.S. map to show the default rate based on 90 days delinquency just for the year of 2016. Which states are having the top 3 highest default rate?
Draw a histogram to show the distribution of FICO Scores in different years. What do you notice?
The way FICO score is being interpreted is very tricky. A small increase at higher score does not have the same meaning as a small increase at the lower score. Therefore, regrouping the variable is needed. Regroup the variable of FICO scores as below (you need to use the variable as your independent variable in Exercise 7B).
Credit.Score | Rating |
---|---|
300-579 | Poor |
580-669 | Fair |
670-739 | Good |
740-799 | Very Good |
800-850 | Exceptional |
The data that you have used so far is pre-processed from the raw data that you can get from Fannie Mae Single-Family Homeloan Data. Try the steps below to pre-process the data by yourself by just using the data from 2016Q1 to 2022Q3 as an exercise.
Access the Data
. Register for an
account if you have not (it is free). Click
Hp (historical loan credit performance data)
/Download data
/Choose
the year 2021Q1 to 2021Q3. The downloaded data should be placed under
the data
-> raw
folder.01_import_data.R
(under prep
folder)
and run the script._stat.csv
at the end
(located in raw folder). The code has been pre-written for you in the
00_read_data.R
script.00_read_data.R
to see
how the data is being converted from transactions data to single account
level data.02_combine.R
(Under prep
folder. It
may takes a while to load. You should get a 20 million records. Yes, 20
millions row!).