Relational Database Intro
Duration: ~30 Minutes
5 Minute Break
Learning Objectives
- Define relational database
- Identify basic structure of relational databases
- Merge two databases using a key
- Bind rows and columns to a dataset
What you will not learn
- How to build relational databases
- Why to use relational databases
- Query language such as SQL
Relational database definition
Collection of datasets that are combined using keys.
This structure reduces the need for duplication.
Relational database structure
Pet-level data
pet_name |
pet_type |
child_key |
Qua |
cat |
2 |
Ray |
iguana |
3 |
Sam |
dog |
3 |
Tea |
cat |
1 |
Urn |
cat |
1 |
Child-level data
child_key |
child_name |
child_age |
1 |
Amy |
8 |
2 |
Ben |
10 |
3 |
Cam |
7 |
Merged dataset structure
Child and pet-level data
pet_name |
pet_type |
child_key |
child_name |
child_age |
Qua |
cat |
2 |
Ben |
10 |
Ray |
iguana |
3 |
Cam |
7 |
Sam |
dog |
3 |
Cam |
7 |
Tea |
cat |
1 |
Amy |
8 |
Urn |
cat |
1 |
Amy |
8 |
Code for merging datasets
These functions merge on a key.
- left_join()
- right_join()
- full_join()
- inner_join()
Download datasets
Download and load example
pet and
child data
pet <- read_csv("[myfolder]/petdata.csv")
child <- read_csv("[myfolder]/childdata.csv")
Left joining
Pets
pet_name |
pet_type |
child_key |
Qua |
cat |
2 |
Ray |
iguana |
3 |
Sam |
dog |
4 |
Children
child_key |
child_name |
child_age |
1 |
Amy |
8 |
2 |
Ben |
10 |
3 |
Cam |
7 |
leftjoin <- pet %>% left_join(child, by = "child_key")
Left joined table
pet_name |
pet_type |
child_key |
child_name |
child_age |
Qua |
cat |
2 |
Ben |
10 |
Ray |
iguana |
3 |
Cam |
7 |
Sam |
dog |
4 |
NA |
NA |
Right joining
Pets
pet_name |
pet_type |
child_key |
Qua |
cat |
2 |
Ray |
iguana |
3 |
Sam |
dog |
4 |
Children
child_key |
child_name |
child_age |
1 |
Amy |
8 |
2 |
Ben |
10 |
3 |
Cam |
7 |
rightjoin <- pet %>% right_join(child, by = "child_key")
Right joined table
pet_name |
pet_type |
child_key |
child_name |
child_age |
NA |
NA |
1 |
Amy |
8 |
Qua |
cat |
2 |
Ben |
10 |
Ray |
iguana |
3 |
Cam |
7 |
Full joining
Pets
pet_name |
pet_type |
child_key |
Qua |
cat |
2 |
Ray |
iguana |
3 |
Sam |
dog |
4 |
Children
child_key |
child_name |
child_age |
1 |
Amy |
8 |
2 |
Ben |
10 |
3 |
Cam |
7 |
fulljoin <- pet %>% full_join(child, by = "child_key")
Fully joined table
pet_name |
pet_type |
child_key |
child_name |
child_age |
Qua |
cat |
2 |
Ben |
10 |
Ray |
iguana |
3 |
Cam |
7 |
Sam |
dog |
4 |
NA |
NA |
NA |
NA |
1 |
Amy |
8 |
Inner joining
Pets
pet_name |
pet_type |
child_key |
Qua |
cat |
2 |
Ray |
iguana |
3 |
Sam |
dog |
4 |
Children
child_key |
child_name |
child_age |
1 |
Amy |
8 |
2 |
Ben |
10 |
3 |
Cam |
7 |
innerjoin <- pet %>% inner_join(child, by = "child_key")
Inner joined table
pet_name |
pet_type |
child_key |
child_name |
child_age |
Qua |
cat |
2 |
Ben |
10 |
Ray |
iguana |
3 |
Cam |
7 |
Code for binding data
These functions bind data to a dataset.
bind_rows()
Rows with the same name must be the same type!
Columns do not all need to match
bindrows <- innerjoin %>% bind_rows(pet)
bind_cols()
Columns must have the same number of rows!
Columns must be in a dataframe, but can be any type.
IDs <- data.frame(IDs = 1:nrow(innerjoin))
bindcols <- IDs %>% bind_cols(innerjoin)
And Now You Know!