 
            . . . We have much to discuss.
data() 
    
Anscombe's Quartet is built into R
data(anscombe)
anscombe
    
  x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89
    attributes(anscombe)summary(anscombe)str(anscombe)View(anscombe)head(anscombe)tail(anscombe)| Independent Variable | Dependent Variable | |
|---|---|---|
| Set 1 | x1 | y1 | 
| Set 2 | x2 | y2 | 
| Set 3 | x3 | y3 | 
| Set 4 | x4 | y4 | 
Anscombe's Quartet is a synthetic data set. The abstract ideas which underlie the normal differences between row and column in a data frame do not really apply here. There is relationship between X1 and X3. But, to use the data, we need to access individual columns of data.
Question How can we access all the values in a given colum?
colMeans(anscombe)
    
      x1       x2       x3       x4       y1       y2       y3       y4 
9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909 
    
cbind(x1 = sd(anscombe$x1),
      x2 = sd(anscombe$x2),
      x3 = sd(anscombe$x3),
      x4 = sd(anscombe$x4)
      )
    
          x1       x2       x3       x4
[1,] 3.316625 3.316625 3.316625 3.316625
     Your Turn!
 Your Turn!
    
cor.test(x=anscombe$x1, y=anscombe$y1)
    
	Pearson's product-moment correlation
data:  anscombe$x1 and anscombe$y1
t = 4.2415, df = 9, p-value = 0.00217
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4243912 0.9506933
sample estimates:
      cor 
0.8164205 
    
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1",
     xlab="x1", ylab="y1"
)
     
    
 Your Turn!
 Your Turn!
    ?cor.test
m1 <- lm(formula=y1~x1, data=anscombe)
m1
    
Call:
lm(formula = y1 ~ x1, data = anscombe)
Coefficients:
(Intercept)  anscombe$x1  
     3.0001       0.5001  
    ?formula
attributes(m1)summary(m1)str(m1)View(m1)head(m1)tail(m1)summary(m1)
Call:
lm(formula = y1 ~ x1, data = anscombe)
Residuals:
     Min       1Q   Median       3Q      Max 
-1.92127 -0.45577 -0.04136  0.70941  1.83882 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   3.0001     1.1247   2.667  0.02573 * 
anscombe$x1   0.5001     0.1179   4.241  0.00217 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.237 on 9 degrees of freedom
Multiple R-squared:  0.6665,	Adjusted R-squared:  0.6295 
F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217
    attributes(m1)
$names
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
$class
[1] "lm"
        attributes(summary(m1))
$names
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
$class
[1] "lm"
        
m1$coefficients
        
(Intercept)          x1 
3.0000909   0.5000909 
        
summary(m1)$coefficients
        
             Estimate Std. Error  t value    Pr(>|t|)
(Intercept) 3.0000909  1.1247468 2.667348 0.025734051
anscombe$x1 0.5000909  0.1179055 4.241455 0.002169629
        
## Same scatterplot, adds the linear model
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1 w/ Model in Red", xlab="x1", ylab="y1")
abline(m1, col="red")
     
    
png("anscombe-1.png")
plot(anscombe$x1, anscombe$y1, main="Anscombe: Set 1 w/ Model in Red", xlab="x1", ylab="y1")
abline(m1, col="red")
dev.off()
    
qqplot(anscombe$x1,anscombe$y1)
abline(m1, col="red")
     
    
 Your Turn!
 Your Turn!
    plot(m1)m1
Call:
lm(formula = y1 ~ x1, data = anscombe)
Coefficients:
(Intercept)           x1  
     3.0001       0.5001  
     8.0011 = 10 * .5001 + 3.0001
p1 <- data.frame(x1=anscombe$x1+30, y1=NA)
p1$y1 <- predict(object=m1, newdata=p1)
    
plot(rbind(anscombe[,c(1,5)], p1))
abline(m1, col="red")
     
    
mGood <- lm(formula=y1~x1, data=anscombe)mBad <- lm(formula=anscombe$y1~anscombe$x1) 
    Titanic in Cobh Harbour, County Cork Ireland