Duration: ~45 Minutes
Fatality Analysis Reporting System, NYS, 2015
Source: FARS website
data
and R
data
Go to: "File" > "New File" > "R Script"
Simple ...
# Author: Abby
# Date: 19 April 2019
# Project: Data management with tidyverse workshop
# Purpose: To practice piping and dplyr
# Data source: FARS person level data, 2015
... or click the down arrow for elaborate ...
#========================================================================#
# Author: Abigail Stamm (GitHub ID: ajstamm) #
# Date: Friday, 19 April 2019 #
# Project: Data management with tidyverse, an R workshop offered by the #
# NYS DOH Epidemiology and Biostatistics Community of Practice #
# Purpose: To practice basic piping and common dplyr commands #
# Data source: FARS person level data, 2015, https://www.nhtsa.gov/ #
# research-data/fatality-analysis-reporting-system-fars #
#========================================================================#
Load tidyverse
.
It will auto-load dplyr
.
library(tidyverse)
Use read.csv()
(base R) or
read_csv()
(tidyverse).
Differences between them are beyond the scope of this workshop.
fars <- read_csv("raw_data/fars2015nys_person.csv",
col_types="ccccccnccccccccc")
Options vary. To see them, query ?
.
Select only observations in Albany (where COUNTY
= 1).
FARS uses FIPS codes.
my_fars <- fars %>% filter(COUNTY == 1)
Drop the following variables:
DOA, SEAT_POS, STATE
my_fars <- my_fars %>%
select(-DOA, -SEAT_POS, -STATE)
Change the following variable names:
my_fars <- my_fars %>% rename(case_num = ST_CASE,
person_num = PER_NO,
vehicle_num = VEH_NO)
ifelse()
For conditional statements
legs <- c(0,2,0,4,2,6,4,8,6,8)
y <- ifelse(legs == 2, "bird",
"not bird")
Try it yourself.
z <- ifelse(legs == 0, "snake",
ifelse(legs == 2, "bird",
"other"))
Click the down arrow for the code.
my_fars <- my_fars %>%
mutate(
driver = (PER_TYP == 1),
agegroup = ifelse(AGE == 999, NA,
ifelse(AGE < 13, "child",
ifelse(AGE < 20, "adolescent",
ifelse(AGE < 30, "young adult",
ifelse(AGE < 65, "middle-aged",
ifelse(AGE >= 65, "older adult", NA))))))
)
Arrange the data by RACE
and HISPANIC
.
my_fars <- my_fars %>% arrange(RACE, HISPANIC)
Check the first few rows. What do you notice?
Combine all steps.
Click the down arrow for the code.
fars <- read_csv("raw_data/fars2015nys_person.csv",
col_types="ccccccnccccccccc")
my_fars <- fars %>%
filter(COUNTY == 1) %>%
select(-DOA, -SEAT_POS, -STATE) %>%
rename(case_num = ST_CASE, person_num = PER_NO,
vehicle_num = VEH_NO) %>%
mutate(driver = (PER_TYP == 1),
agegroup = ifelse(AGE == 999, NA,
ifelse(AGE < 13, "child",
ifelse(AGE < 20, "adolescent",
ifelse(AGE < 30, "young adult",
ifelse(AGE < 65, "middle-aged",
ifelse(AGE >= 65, "older adult",
NA))))))
) %>% arrange(RACE, HISPANIC)
Save to data
. Use a meaningful name.
write.csv(my_fars, "data/fars2015nys_AlbanyCounty_person.csv",
row.names = FALSE)
Next up: Summarizing data