Basic Tidyverse

Duration: ~30 Minutes

5 Minute Break

Clock made of Legos

Learning Objectives

  • Learn piping syntax
  • Subset and sort data
  • Create and rename variables

Herein lies the rub:

R is ludicrously powerful and complex,
with many ways to do the same thing.

Today's goal - learn different ways to do the same things.

Tidyverse

Set of packages that restructure how R code works.

Includes:

  • magittr's piping: %>%
  • dplyr: most functions we will use today

Dplyr

Creator Hadley Wickham

Documentation on
CRAN Page

Photo of Hadley Wickham, 2016

Piping

Base R


foo_foo <- little_bunny()
foo_foo <- hop(foo_foo, 
  through = forest)
foo_foo <- scoop(foo_foo, 
  up = field_mice)
foo_foo <- bop(foo_foo, 
  on = head)
        

Tidyverse


foo_foo <- little_bunny() %>%
  hop(through = forest) %>%
  scoop(up = field_mice) %>%
  bop(on = head)
  
  
  
        

Example from R for Data Science

Conditional selection

To select rows based on some condition, use filter().

Base R


data(iris)
my_iris <- 
  iris[iris$Species == "setosa" & 
       iris$Petal.Width < 0.5, ]
       
       
        

Tidyverse


data(iris)
my_iris <- iris %>% 
  filter(
    Species == "setosa", 
    Petal.Width < 0.5
  )
        

Variable selection

To choose specific columns, use select().
To choose all columns except x, use select(-x).

Base R


my_iris <- 
  iris[, c("Species", 
           "Sepal.Length", 
           "Sepal.Width")]
           
           
        

Tidyverse


my_iris <- iris %>% 
  select(
    Species, 
    Sepal.Length,
    Sepal.Width
  )
        

Sorting columns

To sort by one or more columns, use arrange().

Base R


my_iris <- 
  iris[order(iris$Sepal.Width, 
       -iris$Sepal.Length), ]
       
       
        

Tidyverse


my_iris <- iris %>% 
  arrange(
    Sepal.Width, 
    desc(Sepal.Length)
  )
        

Renaming variables

To rename a column, use rename().

Base R


my_iris <- iris
names(my_iris)[1:2] <- 
  c("slength", "swidth")
  
  
        

Tidyverse


my_iris <- iris %>% 
  rename(
    swidth = Sepal.Width, 
    slength = Sepal.Length
  )
        

Creating variables

To create a new variable, use mutate().

Base R


data(iris)
my_iris <- iris
my_iris$Sepal.Ratio <- 
  my_iris$Sepal.Length / 
  my_iris$Sepal.Width
my_iris$Petal.Ratio <- 
  my_iris$Petal.Length / 
  my_iris$Petal.Width
        

Tidyverse


data(iris)
my_iris <- iris %>% 
  mutate(
    Sepal.Ratio = Sepal.Length / 
                  Sepal.Width,
    Petal.Ratio = Petal.Length / 
                  Petal.Width
  )
        

Your Turn: filter()

Filter only versicolor irises using pipes.


my_iris <- iris %>% 
  filter(Species == "versicolor")
      

To check your code, try head() or summary()

Versicolor Iris

Versicolor Iris

Your Turn: mutate()

In your subset, create Ratio.Length as a ratio of Petal.Length and Sepal.Length using pipes.


my_iris <- my_iris %>% 
  mutate(
    Ratio.Length = Petal.Length / 
                   Sepal.Length
  )
      
Petal & Sepal

Diagram of petal and sepal dimensions

Your Turn: select()

Select only length variables from your subset using pipes.


my_iris <- my_iris %>% 
  subset(Petal.Length, 
         Sepal.Length, 
         Ratio.Length)
      
Setosa Iris

Setosa Iris

Your Turn: arrange()

Arrange your subset by Ratio.Length using pipes.


my_iris <- my_iris %>% arrange(Ratio.Length)
    

Now arrange by Ratio.Length in descending order.


my_iris <- my_iris %>% arrange(desc(Ratio.Length))
    

Your Turn: rename()

In your subset, rename Petal.Length to petallength using pipes.


my_iris <- my_iris %>% 
  rename(petallength = Petal.Length)
      
Virginica Iris

Virginica Iris

Your Turn: Chaining pipes

Now chain together the steps in the previous slides:

  1. Filter versicolor irises.
  2. Create Ratio.Length from Petal.Length and Sepal.Length.
  3. Select length variables.
  4. Arrange by Ratio.Length in descending order.
  5. Rename Petal.Length to petallength.

Code is on the next slide.

Chaining pipes


my_iris <- iris %>% 
  filter(Species == "versicolor") %>%
  mutate(Ratio.Length = Petal.Length / Sepal.Length) %>%
  select(Petal.Length, Sepal.Length, Ratio.Length) %>%
  arrange(desc(Ratio.Length)) %>%
  rename(petallength = Petal.Length)
    

And Now You Know!

Q & A

Next up: tidyverse practice

But first: 5 minute break