Class 7

In-Class Lab

Rethinking: Section 7.5.1

library(rethinking)

## R code 6.13
set.seed(71)
# number of plants
N <- 100

# simulate initial heights
h0 <- rnorm(N,10,2)

# assign treatments and simulate fungus and growth
treatment <- rep( 0:1 , each=N/2 )
fungus <- rbinom( N , size=1 , prob=0.5 - treatment*0.4 )
h1 <- h0 + rnorm(N, 5 - 3*fungus)

# compose a clean data frame
d <- data.frame( h0=h0 , h1=h1 , treatment=treatment , fungus=fungus )
precis(d)

##               mean        sd      5.5%    94.5%    histogram
## h0         9.95978 2.1011623  6.570328 13.07874 ▁▂▂▂▇▃▂▃▁▁▁▁
## h1        14.39920 2.6880870 10.618002 17.93369     ▁▁▃▇▇▇▁▁
## treatment  0.50000 0.5025189  0.000000  1.00000   ▇▁▁▁▁▁▁▁▁▇
## fungus     0.23000 0.4229526  0.000000  1.00000   ▇▁▁▁▁▁▁▁▁▂

## R code 6.14
sim_p <- rlnorm( 1e4 , 0 , 0.25 )
precis( data.frame(sim_p) )

##          mean        sd     5.5%    94.5%    histogram
## sim_p 1.03699 0.2629894 0.670683 1.496397 ▁▁▃▇▇▃▁▁▁▁▁▁

Recall:

m6.6 = model with just an intercept

m6.7 = model includes treatment and fungus (post-treatment bias)

m6.8 = model with just treatment and correctly infers causal influence of the treatment

## R code 6.15
m6.6 <- quap(
  alist(
    h1 ~ dnorm( mu , sigma ),
    mu <- h0*p,
    p ~ dlnorm( 0 , 0.25 ),
    sigma ~ dexp( 1 )
  ), data=d )
precis(m6.6)

##           mean         sd     5.5%    94.5%
## p     1.426626 0.01760992 1.398482 1.454770
## sigma 1.793286 0.12517262 1.593236 1.993336

## R code 6.16
m6.7 <- quap(
  alist(
    h1 ~ dnorm( mu , sigma ),
    mu <- h0 * p,
    p <- a + bt*treatment + bf*fungus,
    a ~ dlnorm( 0 , 0.2 ) ,
    bt ~ dnorm( 0 , 0.5 ),
    bf ~ dnorm( 0 , 0.5 ),
    sigma ~ dexp( 1 )
  ), data=d )
precis(m6.7)

##               mean         sd        5.5%       94.5%
## a      1.481391468 0.02451069  1.44221865  1.52056429
## bt     0.002412222 0.02986965 -0.04532525  0.05014969
## bf    -0.266718915 0.03654772 -0.32512923 -0.20830860
## sigma  1.408797442 0.09862070  1.25118251  1.56641237

## R code 6.17
m6.8 <- quap(
  alist(
    h1 ~ dnorm( mu , sigma ),
    mu <- h0 * p,
    p <- a + bt*treatment,
    a ~ dlnorm( 0 , 0.2 ),
    bt ~ dnorm( 0 , 0.5 ),
    sigma ~ dexp( 1 )
  ), data=d )
precis(m6.8)

##             mean         sd       5.5%     94.5%
## a     1.38035767 0.02517554 1.34012229 1.4205931
## bt    0.08499924 0.03429718 0.03018573 0.1398128
## sigma 1.74631655 0.12191552 1.55147200 1.9411611

# 7.25
set.seed(11)
WAIC(m6.7)

##       WAIC      lppd  penalty  std_err
## 1 361.4511 -177.1724 3.553198 14.17033

set.seed(77)
compare(m6.6, m6.7, m6.8, func=WAIC)

##          WAIC       SE    dWAIC      dSE    pWAIC       weight
## m6.7 361.8901 14.26190  0.00000       NA 3.839491 1.000000e+00
## m6.8 402.7757 11.28257 40.88562 10.47837 2.645879 1.323732e-09
## m6.6 405.9139 11.64641 44.02380 12.22582 1.581312 2.756471e-10

WAIC = smaller values are better and the models are ordered by WAIC, best to worst.

pWAIC = penalty term of WAIC.

plot(compare(m6.6, m6.7, m6.8))

The filled points are the in-sample deviance values. The open points are the WAIC values. Notice that naturally each model does better in-sample than it is expected to do out-of-sample. The line segments show the standard error of each WAIC. These are the values in the column labeled SE in the table above.

So you can probably see how much better m6.7 is than m6.8. What we really want however is the standard error of the difference in WAIC between the two models. That is shown by the lighter line segment with the triangle on it, between m6.7 and m6.8.

What does all of this mean?

It means that WAIC cannot be used to infer causation. We know, because we simulated these data, that the treatment matters. But because fungus mediates treatment—it is on a pipe between treatment and the outcome—once we condition on fungus, treatment provides no additional information. And since fungus is more highly correlated with the outcome, a model using it is likely to predict better. WAIC did its job. Its job is not to infer causation. Its job is to guess predictive accuracy.

Review different programs

Book website

Package versions

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices datasets  utils     methods  
## [8] base     
## 
## other attached packages:
## [1] rethinking_2.21      cmdstanr_0.4.0.9001  rstan_2.21.3        
## [4] ggplot2_3.3.5        StanHeaders_2.21.0-7
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7           mvtnorm_1.1-3        lattice_0.20-44     
##  [4] lubridate_1.8.0      prettyunits_1.1.1    ps_1.6.0            
##  [7] assertthat_0.2.1     digest_0.6.29        utf8_1.2.2          
## [10] mime_0.12            R6_2.5.1             backports_1.4.1     
## [13] stats4_4.1.1         coda_0.19-4          evaluate_0.14       
## [16] highr_0.9            blogdown_1.5         pillar_1.6.4        
## [19] bsplus_0.1.3         rlang_0.4.12         rstudioapi_0.13     
## [22] callr_3.7.0          jquerylib_0.1.4      checkmate_2.0.0     
## [25] rmarkdown_2.11       stringr_1.4.0        loo_2.4.1           
## [28] munsell_0.5.0        compiler_4.1.1       xfun_0.28           
## [31] pkgconfig_2.0.3      base64enc_0.1-3      pkgbuild_1.3.1      
## [34] shape_1.4.6          htmltools_0.5.2      tidyselect_1.1.1    
## [37] tensorA_0.36.2       tibble_3.1.6         gridExtra_2.3       
## [40] bookdown_0.24        codetools_0.2-18     matrixStats_0.61.0  
## [43] fansi_0.5.0          crayon_1.4.2         dplyr_1.0.7         
## [46] withr_2.4.3          MASS_7.3-54          distributional_0.2.2
## [49] grid_4.1.1           jsonlite_1.7.2       gtable_0.3.0        
## [52] lifecycle_1.0.1      DBI_1.1.1            magrittr_2.0.1      
## [55] posterior_1.1.0      scales_1.1.1         RcppParallel_5.1.4  
## [58] cli_3.1.0            stringi_1.7.6        farver_2.1.0        
## [61] renv_0.14.0          fs_1.5.0             bslib_0.3.1         
## [64] ellipsis_0.3.2       vctrs_0.3.8          generics_0.1.1      
## [67] tools_4.1.1          glue_1.6.0           purrr_0.3.4         
## [70] processx_3.5.2       abind_1.4-5          fastmap_1.1.0       
## [73] yaml_2.2.1           inline_0.3.19        colorspace_2.0-2    
## [76] downloadthis_0.2.1   knitr_1.36           sass_0.4.0

Last updated on Feb 28, 2022