The data set noisy_example contains the results for a series of regression models that were created from a small dataset with considerable variability. For resampling, 10 repeats of 10-fold cross-validation were used to estimate performance. We will compare models using the root mean squared error (RMSE) metric.

library(tidyposterior)
data("noisy_example")

library(tidyverse)

rmses <- noisy_example %>%
   select(id, id2, contains("RMSE")) %>%
   setNames(tolower(gsub("_RMSE$", "", names(.))))

# By removing the `splits` column, `rmses` is a basic data frame instead of
# an `rset` object. 
stacked_rmse <- gather(rmses, key = model, value = statistic, -id, -id2)

mean_rmse <- stacked_rmse %>%
  group_by(model) %>%
  summarise(statistic = mean(statistic))
## `summarise()` ungrouping output (override with `.groups` argument)
library(ggplot2)

ggplot(stacked_rmse,
       aes(
         x = model,
         y = statistic,
         group = paste(id, id2),
         col = paste(id, id2))
       ) +
  geom_line(alpha = .75) +
  theme(legend.position = "none")

ggplot(stacked_rmse, aes(col = model, x = statistic)) +
  geom_line(stat = "density", trim = FALSE) +
  theme(legend.position = "top")

A few observations about these data:

  • The RMSE values vary 5-fold over the resampling results
  • Many of the lines cross, indicating that the resample-to-resample variability might be larger than the model-to-model variability.
  • The violin plots show right-skewed distributions that, given the variability, are approaching the asymptote of zero.

A few different Bayesian models will be fit to these data.

A First Model

It might make sense to use a probability model that is consistent with the characteristics of the data (in terms of skewness). Instead of using a symmetric distribution for the data (such as Gaussian), a potentially right skewed probability model might make more sense. A Gamma distribution is a reasonable choice and can be fit using the generalized linear model embedded in perf_mod. This also requires a link function to be chosen to model the data. The canonical link for this distribution is the inverse transformation and this will be our choice.

To fit this model, the family argument to stan_glmer can be passed in. The default link is the inverse and no extra transformation will be used.

gamma_model <- perf_mod(rmses, family = Gamma(), seed = 74)
## Warning: There were multiple resample ID columns in the data. It is unclear what the model formula
## should be for the hierarchical model. This analysis used the formula: `statistic ~ model + (1 | id2/
## id)` The `formula` arg can be used to change this value.
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000131 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.31 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 107.125 seconds (Warm-up)
## Chain 1:                8.26674 seconds (Sampling)
## Chain 1:                115.391 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 4.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 113.953 seconds (Warm-up)
## Chain 2:                8.29632 seconds (Sampling)
## Chain 2:                122.25 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 4.7e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 110.325 seconds (Warm-up)
## Chain 3:                5.85664 seconds (Sampling)
## Chain 3:                116.181 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 4e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 108.162 seconds (Warm-up)
## Chain 4:                8.26914 seconds (Sampling)
## Chain 4:                116.431 seconds (Total)
## Chain 4:
# Get the posterior distributions of the mean parameters:
gamma_post <- tidy(gamma_model, seed = 3750)
gamma_mean <- summary(gamma_post)
gamma_mean
## # A tibble: 4 x 4
##   model   mean lower upper
##   <chr>  <dbl> <dbl> <dbl>
## 1 bag     18.2  16.8  19.7
## 2 cubist  16.7  15.5  18.0
## 3 mars    17.9  16.5  19.4
## 4 nnet    19.0  17.5  20.7

Are these values consistent with the data? Let’s look at the posterior distribution and overlay the observed and predicted mean RMSE values.

ggplot(gamma_post) +
  geom_point(data = gamma_mean, aes(y = mean), alpha = .5) +
  geom_point(data = mean_rmse, aes(y = statistic),
             col = "red", pch = 4, cex= 3)
## Warning: `ggplot.posterior()` is deprecated as of tidyposterior 0.1.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

The observed mean is not close to the center of the (skewed) posterior distributions. Let’s try something else.

Transforming the Data

Another approach is to transform the RMSE values to something model symmetric and model the data on a different scale. A log transform will be used here using the built-in object ln_trans. In using this option, the posterior distributions are computed on the log scale and is automatically back-transformed into the original units. By not passing family to the function, we are using a Gaussian model.

log_linear_model <- perf_mod(rmses, transform = ln_trans, seed = 74)
## Warning: There were multiple resample ID columns in the data. It is unclear what the model formula
## should be for the hierarchical model. This analysis used the formula: `statistic ~ model + (1 | id2/
## id)` The `formula` arg can be used to change this value.
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 4.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.70341 seconds (Warm-up)
## Chain 1:                0.96132 seconds (Sampling)
## Chain 1:                2.66473 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 4.4e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.84344 seconds (Warm-up)
## Chain 2:                0.960192 seconds (Sampling)
## Chain 2:                2.80364 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 4e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.62729 seconds (Warm-up)
## Chain 3:                0.968658 seconds (Sampling)
## Chain 3:                2.59595 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 3.5e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 1.45021 seconds (Warm-up)
## Chain 4:                0.962666 seconds (Sampling)
## Chain 4:                2.41287 seconds (Total)
## Chain 4:
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

There were some message regarding sampling and divergent transitions. We could use the shinystan or coda packages to look into this model.

log_linear_post <- tidy(log_linear_model, seed = 3750)

log_linear_mean <- summary(log_linear_post)
log_linear_mean
## # A tibble: 4 x 4
##   model   mean lower upper
##   <chr>  <dbl> <dbl> <dbl>
## 1 bag     18.7  17.1  20.4
## 2 cubist  16.5  15.1  18.0
## 3 mars    17.8  16.3  19.5
## 4 nnet    20.4  18.7  22.3
ggplot(log_linear_post) +
  geom_point(data = log_linear_mean, aes(y = mean), alpha = .5) +
  geom_point(data = mean_rmse, aes(y = statistic),
             col = "red", pch = 4, cex= 3)

The posteriors are a lot less skewed but the observed and estimated means are still fairly far away from one another. Since these differences are in the same direction, this would not appear to be related to the shrinkage properties of Bayesian models.

A Simple Gaussian Model

Let’s try the easiest model that used a linear function and assumes a Gaussian distirbution for the RMSE estimates.

linear_model <- perf_mod(rmses, seed = 74)
## Warning: There were multiple resample ID columns in the data. It is unclear what the model formula
## should be for the hierarchical model. This analysis used the formula: `statistic ~ model + (1 | id2/
## id)` The `formula` arg can be used to change this value.
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 4.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 2.07118 seconds (Warm-up)
## Chain 1:                1.90303 seconds (Sampling)
## Chain 1:                3.97421 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 4.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.74147 seconds (Warm-up)
## Chain 2:                1.60185 seconds (Sampling)
## Chain 2:                3.34333 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 3.7e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.5812 seconds (Warm-up)
## Chain 3:                1.63005 seconds (Sampling)
## Chain 3:                3.21125 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 4.1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 1.96444 seconds (Warm-up)
## Chain 4:                1.94405 seconds (Sampling)
## Chain 4:                3.90849 seconds (Total)
## Chain 4:
## Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
linear_post <- tidy(linear_model, seed = 3750)
linear_mean <- summary(linear_post)

ggplot(linear_post) +
  geom_point(data = linear_mean, aes(y = mean), alpha = .5) +
  geom_point(data = mean_rmse, aes(y = statistic),
             col = "red", pch = 4, cex= 3)

These are right on target. Despite the skewness of the original data, a simple linear model did best here. In hindsight, this makes sense since we are modeling summary statistics as our outcome. Even if we believe these to be potentially skewed distributions, the central limit theorem is kicking in here and the estimates are tending to normality.

We can compare models using the contrast_models function. The function has arguments for two sets of models to compare but if these are left to their default (NULL), all pair-wise combinations are used. Let’s say that an RMSE difference of 1 unit is important.

all_contrasts <- contrast_models(linear_model, seed = 8967)
ggplot(all_contrasts, size = 1)
## Warning: `ggplot.posterior_diff()` is deprecated as of tidyposterior 0.1.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

summary(all_contrasts, size = 1)
## # A tibble: 6 x 9
##   contrast       probability   mean  lower   upper  size pract_neg pract_equiv pract_pos
##   <chr>                <dbl>  <dbl>  <dbl>   <dbl> <dbl>     <dbl>       <dbl>     <dbl>
## 1 bag vs cubist       0.998   2.26   1.04   3.53       1    0          0.044     0.956  
## 2 bag vs mars         0.725   0.453 -0.810  1.69       1    0.0288     0.730     0.241  
## 3 bag vs nnet         0.0402 -1.34  -2.57  -0.0885     1    0.668      0.330     0.00125
## 4 cubist vs mars      0.01   -1.81  -3.07  -0.512      1    0.857      0.142     0.00025
## 5 cubist vs nnet      0      -3.60  -4.89  -2.31       1    0.999      0.00075   0      
## 6 mars vs nnet        0.0127 -1.79  -3.05  -0.516      1    0.849      0.151     0

Based on our effect size of a single unit, the only pair that are practically equivalent are MARS and bagged trees. Since cubist has the smallest RMSE, it is not unreasonable to say that this model provides uniformly better results than the others shown here.

One Final Note

The Bayesian models have population parameters for the model effects (akin to “fixed” effects in mixed models) as well as variance parameter(s) related to the resamples. The posteriors computed by this package only reflect the mean parameters and should only be used to make inferences about this data set generally. This posterior calculation could not be used to predict the level of performance for a model on a new resample of the data. In this case, the variance parameters come into play and the posterior would be much wider.

In essence, the posteriors shown here are measuring the average performance value instead of a resample-specific value.