Tidying data:

Converting long tables to wide and vice versa

Duration: ~20 Minutes

Clock made of Legos

Learning Objectives

  • Describe data tidying
  • Gather: convert wide to long table
  • Spread: convert long to wide table

Tidying data

Preparing data for analysis, for example,

  • Recoding variables
  • Reformatting variables
  • Joining tables
  • Restructuring tables

Prepare your project

  • Create a new R script file in your project folder
  • Add a heading
  • Load tidyverse
  • Download the example dataset
  • 
    childpet <- read_csv("data/child-pet-data.csv")
    
    

    spread()

    Convert a table from long to wide.
    Function: spread(key, value)

    
    # sample code
    myspread <- mydata %>% spread(key = myvars, value = mycells)
    
    

    In the code above,

    • key: variable whose values will become the new variable names
    • value: variable whose values will become the new values for the new variables

    spread() example: pets

    Turn each pet species into a new variable. Use the pet's name as the value.

    
    petwide <- childpet %>% spread(key = pet_species, value = pet_name)
    
    

    Resulting table

    
    > head(petwide)
    
    # A tibble: 6 x 6
    #   child_name child_age child_sex bird   cat    dog   
    # 1 Amy                7 F         Connie Argyle Brutus
    # 2 Ben                8 M         NA     NA     Dufus 
    # 3 Cam                8 F         NA     Fannie Earl  
    # 4 Don                7 M         Gary   Hector NA    
    # 5 Eli                9 M         Ingrid Kimber Jolly 
    # 6 Fay                9 F         Norma  Laurel Morton
    
    

    What does NA mean?

    spread() example: children

    Turn each child sex into a new variable. There is no appropriate variable already to serve as the value, so create one first.

    
    childwide <- childpet %>% mutate(value = 1) %>%
      spread(key = child_sex, value = value)
    
    

    Resulting table

    
    > head(childwide)
    
    # A tibble: 6 x 6
    #   child_name child_age pet_name pet_species     F     M
    # 1 Amy                7 Argyle   cat             1    NA
    # 2 Amy                7 Brutus   dog             1    NA
    # 3 Amy                7 Connie   bird            1    NA
    # 4 Ben                8 Dufus    dog            NA     1
    # 5 Cam                8 Earl     dog             1    NA
    # 6 Cam                8 Fannie   cat             1    NA
    
    

    Creating dummy variables

    If you plan to use sex as dummy variables, for example for regression, you will want to convert the missings to 0.
    Note: if any value is truly missing, the code below will need to be modified. We will not cover that today.

    
    # warning: *all* NA values will be converted to 0 
    childwide[is.na(childwide)] <- 0
    
    

    Your turn

    Create a table with one column for each child age using the spread() commands on the previous slides.

    Use child sex for the value.

    Answer

    
    childagewide <- child %>%
      spread(key = child_age, value = child_sex)
    
    

    gather()

    Convert a table from wide to long.
    Function: gather(key, value)

    
    # sample code
    mygather <- mydata %>% gather(key = myvars, value = mycells, 
                                  var1:varN)
    
    

    In the code above,

    • var1:varN: variables to be combined
    • key: name for the variable that will list the variable names
    • value: name for the variable that will list the variables' values

    gather() example: pets

    Recreate the pet_species variable.

    
    petlong <- petwide %>% gather(key = pet_species, value = pet_name,
                                  bird:dog)
    
    

    Resulting table

    
    > head(petlong)
    
    # A tibble: 6 x 5
    #   child_name child_age child_sex pet_species pet_name
    # 1 Amy                7 F         bird        Connie  
    # 2 Ben                8 M         bird        NA      
    # 3 Cam                8 F         bird        NA      
    # 4 Don                7 M         bird        Gary    
    # 5 Eli                9 M         bird        Ingrid  
    # 6 Fay                9 F         bird        Norma   
    
    

    Removing unnecessary rows

    Whether the rows are truly unnecessary depends on your analysis. Here, we assume they are.

    
    petlong <- petlong %>% filter(!is.na(pet_name))
    
    

    And Now You Know!

    Q & A