Duration: ~20 Minutes
Preparing data for analysis, for example,
childpet <- read_csv("data/child-pet-data.csv")
Convert a table from long to wide.
Function: spread(key, value)
# sample code
myspread <- mydata %>% spread(key = myvars, value = mycells)
In the code above,
key
: variable whose values will become the new variable namesvalue
: variable whose values will become the new values for the new variablesTurn each pet species into a new variable. Use the pet's name as the value.
petwide <- childpet %>% spread(key = pet_species, value = pet_name)
> head(petwide)
# A tibble: 6 x 6
# child_name child_age child_sex bird cat dog
# 1 Amy 7 F Connie Argyle Brutus
# 2 Ben 8 M NA NA Dufus
# 3 Cam 8 F NA Fannie Earl
# 4 Don 7 M Gary Hector NA
# 5 Eli 9 M Ingrid Kimber Jolly
# 6 Fay 9 F Norma Laurel Morton
What does NA
mean?
Turn each child sex into a new variable. There is no appropriate variable already to serve as the value, so create one first.
childwide <- childpet %>% mutate(value = 1) %>%
spread(key = child_sex, value = value)
> head(childwide)
# A tibble: 6 x 6
# child_name child_age pet_name pet_species F M
# 1 Amy 7 Argyle cat 1 NA
# 2 Amy 7 Brutus dog 1 NA
# 3 Amy 7 Connie bird 1 NA
# 4 Ben 8 Dufus dog NA 1
# 5 Cam 8 Earl dog 1 NA
# 6 Cam 8 Fannie cat 1 NA
If you plan to use sex as dummy variables, for example for regression,
you will want to convert the missings to 0.
Note: if any value is truly missing, the code below will need to be
modified. We will not cover that today.
# warning: *all* NA values will be converted to 0
childwide[is.na(childwide)] <- 0
Create a table with one column for each child age using the
spread()
commands on the previous slides.
Use child sex for the value.
childagewide <- child %>%
spread(key = child_age, value = child_sex)
Convert a table from wide to long.
Function: gather(key, value)
# sample code
mygather <- mydata %>% gather(key = myvars, value = mycells,
var1:varN)
In the code above,
var1:varN
: variables to be combinedkey
: name for the variable that will list the variable namesvalue
: name for the variable that will list the variables' valuesRecreate the pet_species variable.
petlong <- petwide %>% gather(key = pet_species, value = pet_name,
bird:dog)
> head(petlong)
# A tibble: 6 x 5
# child_name child_age child_sex pet_species pet_name
# 1 Amy 7 F bird Connie
# 2 Ben 8 M bird NA
# 3 Cam 8 F bird NA
# 4 Don 7 M bird Gary
# 5 Eli 9 M bird Ingrid
# 6 Fay 9 F bird Norma
Whether the rows are truly unnecessary depends on your analysis. Here, we assume they are.
petlong <- petlong %>% filter(!is.na(pet_name))