Duration: ~30 Minutes
accident <- read_csv("Data/fars2015nys_accident.csv",
col_types="cnccnccnnnnnnncccccc")
For this exercise, create two tables.
For these examples, county will be the rows and month will be the columns
This is a complex exercise that will use several things you have learned so far.
Click the down arrow if you need a hint or two.
Create a summary dataset. Remember to group variables first.
Identify your key
and value
variables.
summarized_accident <- accident %>%
group_by(ST_CO_FIPS, MONTH) %>%
summarize(
total_deaths = sum(FATALS),
avg_vehicles = mean(VE_TOTAL)
) %>%
ungroup()
Create total vehicular fatalities by county by month.
spread_deaths <- summarized_accident %>%
select(-avg_vehicles) %>%
spread(key = MONTH, value = total_deaths)
Note: in this case, missings mean there were no deaths. To replace missings with 0, use this code.
spread_deaths[is.na(spread_deaths)] <- 0
Click the down arrow if you need help with the code for vehicles.
Create average vehicles per accident by county by month.
spread_vehicles <- summarized_accident %>%
select(-total_deaths) %>%
spread(key = MONTH, value = avg_vehicles)
spread_vehicles[is.na(spread_vehicles)] <- 0
For this exercise, combine your two spread tables to recreate
summarized_accident.
This is a complex exercise that will use several things you have learned so far.
Click the down arrow if you need a hint or three.
Convert each spread table from long to wide.
Merge the two tables by county and month.
Remove all rows for which both total fatalities and average vehicles equal 0.
Create a long table for the deaths.
gathered_deaths <- spread_deaths %>%
gather(key = MONTH, value = total_deaths, -ST_CO_FIPS)
Click the down arrow in you need help with the code for vehicles.
Create a long table for the vehicles.
gathered_vehicles <- spread_vehicles %>%
gather(key = MONTH, value = avg_vehicles, -ST_CO_FIPS)
gathered_accident <- gathered_deaths %>%
left_join(gathered_vehicles, by = c("ST_CO_FIPS", "MONTH"))
For this exercise, full_join
and right_join
would also work. Why?
To test if the tables are equivalent, try these:
nrow(gathered_accident) == nrow(summarized_accident)
# this function will give either TRUE
# or at least one way the tables are not equivalent
all.equal(gathered_accident, summarized_accident)
Why are they not eqivalent? What can we do to fix that?
Remove extra rows from gathered_accident
.
gathered_accident <- gathered_accident %>%
filter(avg_vehicles != 0 | total_deaths != 0)
To make the tables identical, you would also need to sort row order.
Enjoy R and please fill out our evaluation.