library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(mlr3viz)
library(bbotk)
library(mlr3misc)
= tsk("german_credit")
task
set.seed(2409)
Goal
Implement Greedy Ensemble Selection from scratch. Learn how to do stacking with mlr3pipelines
.
German Credit Data
Description
- Data from 1973 to 1975 from a large regional bank in southern Germany classifying credits described by a set of attributes to good or bad credit risks.
- Stratified sample of 1000 credits (300 bad ones and 700 good ones).
- Customers with good credit risks perfectly complied with the conditions of the contract while customers with bad credit risks did not comply with the contract as required.
- Available in
tsk("german_credit")
.
Data Dictionary
n = 1,000 observations of credits
credit_risk
: Has the credit contract been complied with (good) or not (bad)?age
: Age of debtor in yearsamount
: Credit amount in DMcredit_history
: History of compliance with previous or concurrent credit contractsduration
: Credit duration in monthsemployment_duration
: Duration of debtor’s employment with current employerforeign_worker
: Whether the debtor is a foreign workerhousing
: Type of housing the debtor lives ininstallment_rate
: Credit installments as a percentage of debtor’s disposable incomejob
: Quality of debtor’s jobnumber_credits
: Number of credits including the current one the debtor has (or had) at this bankother_debtors
: Whether there is another debtor or a guarantor for the creditother_installment_plans
: Installment plans from providers other than the credit-giving bankpeople_liable
: Number of persons who financially depend on the debtorpersonal_status_sex
: Combined information on sex and marital statuspresent_residence
: Length of time (in years) the debtor lives in the present residenceproperty
: The debtor’s most valuable propertypurpose
: Purpose for which the credit is neededsavings
: Debtor’s savingstatus
: Status of the debtor’s checking account with the banktelephone
: Whether there is a telephone landline registered on the debtor’s name
Prerequisites
1 Implement Greedy Ensemble Selection
In previous exercises we have seen how to do weighted model averaging and how to optimize weights numerically. We now want to implement a popular method for optimizing the weights when performing model averaging: Greedy ensemble selection (GES)
First, read the original paper (https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf) and make yourself familiar with the slides covering greedy ensemble selection. Then, revisit the in-class exercise set.
In the in-class exercises, we wanted to do model averaging of a decision tree, a k-NN (k = 7) and a logistic regression. To optimize the weights numerically, we wrote an objective function, that allows for optimizing the weights of each model so that the resulting weighted model averaging is optimal.
We now want to use GES instead of the CMA-ES. First, we will implement GES without relying on the bbotk
infrastructure. Afterwards, you are free to take a look at how Optimizer
s are implemented in bbotk
, see, e.g., https://github.com/mlr-org/bbotk/blob/70f3f96ff0023066aa8eaa7c97484f0d6e518663/R/OptimizerCmaes.R and implement greedy ensemble selection as a proper Optimizer
.
In this exercise, we will also use an outer test split to properly evaluate the performance of our ensemble with optimized weight in an unbiased manner. After you have implemented GES, make sure to benchmark the weights it found against the weights found by CMA-ES.
You can use the following code to do this:
= benchmark_grid(task, list(grl, grl_weights_optimized_cmaes, dt, kknn, log_reg, featureless), outer_split)
bg = benchmark(bg)
b $aggregate(msr("classif.auc")) b
Note that this exercise is quite lengthy and rather complicated. It might be best to more or less directly work along the code skeleton provided in Hint 2:
.
Hint 1:
GES is conceptually very simple:
- You start with the empty ensemble and add the model that has highest validation performance.
- Then, you check for each model what the validation performance of the ensemble would be if you add that model to the ensemble.
- To calculate the weights for each model, you simply count the number of the times the model has been added to the ensemble so far and divide it by the total number of models added.
- Finally, to calculate the probability predictions of the ensemble, you perform weighted model averaging as usual.
- You repeat this until a given number of iterations has been reached.
Hint 2:
= rsmp("holdout")$instantiate(task)
outer_split
#########################################################################################
### old code part from previous exercise using CMA-ES to numerically optimize the weights
### modified to work with the newly introduced outer split
= lrn("classif.rpart")
dt $predict_type = "prob"
dt= lrn("classif.kknn", k = 7L)
kknn $predict_type = "prob"
kknn= lrn("classif.log_reg")
log_reg $predict_type = "prob"
log_reg= lrn("classif.featureless")
featureless $predict_type = "prob"
featureless
= gunion(list(dt, kknn, log_reg)) %>>% po("classifavg")
gr = as_learner(gr)
grl
# optimize weights on the train/validation split from the outer_split
= task$clone(deep = TRUE)$filter(outer_split$train_set(1L))
train_valid_task = 10L
k = rsmp("cv", folds = k)$instantiate(train_valid_task)
resampling
= resample(train_valid_task, dt, resampling)
dt_pred = resample(train_valid_task, kknn, resampling)
kknn_pred = resample(train_valid_task, log_reg, resampling)
log_reg_pred
= function(xs) {
objective_function = unlist(xs)
weights = weights / sum(weights) # sum to 1 normalization
weights = map_dbl(seq_len(k), function(fold) {
aucs = dt_pred$predictions()[[fold]]
dt_p = kknn_pred$predictions()[[fold]]
kknn_p = log_reg_pred$predictions()[[fold]]
log_reg_p = dt_p$row_ids
row_ids stopifnot(all(row_ids == kknn_p$row_ids) && all(row_ids == log_reg_p$row_ids))
= dt_p$truth
truth = Reduce("+", list(dt_p$prob * weights[1L], kknn_p$prob * weights[2L], log_reg_p$prob * weights[3L]))
weighted_probs = colnames(weighted_probs)[apply(weighted_probs, MARGIN = 1L, FUN = which.max)]
weighted_response = PredictionClassif$new(row_ids = row_ids, truth = truth, response = weighted_response, prob = weighted_probs)
weighted_p $score(msr("classif.auc"))
weighted_p
})list(classif.auc = mean(aucs), weights = list(weights))
}
= ps(w_dt = p_dbl(0, 1), w_kknn = p_dbl(0, 1), w_log_reg = p_dbl(0, 1))
domain = ps(classif.auc = p_dbl(0.5, 1, tags = "maximize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain,
id = "optimize_grl_weights_cmaes"
)
= OptimInstanceSingleCrit$new(
instance objective = objective,
terminator = trm("evals", n_evals = 100L)
)
= opt("cmaes")
optimizer $optimize(instance)
optimizer
# configure the ensemble with the optimized weights and evaluate on the outer split
= grl$clone(deep = TRUE)
grl_weights_optimized_cmaes $param_set$set_values(.values = list(classifavg.weights = instance$archive$best()$weights[[1L]]))
grl_weights_optimized_cmaes$id = paste0(grl_weights_optimized_cmaes$id, ".optimized_cmaes")
grl_weights_optimized_cmaes#########################################################################################
#########################################################################################
### GES implementation
# selected is an integer vector of length t where t is the current iteration within GES
# learner_num_ids is an integer vector from 1, ..., M where M is the number of models
# get_ges_weights should calculate the weights for each model based on the number of times is has been selected
= function(selected, learner_num_ids) {
get_ges_weights
...
weights
}
# learner_preds is a list of the resampled learner predictions, e.g., list(dt_pred, kknn_pred, log_reg_pred)
# iterations is an integer specifying the number of iterations GES should be performed
# measure is the measure used to score the predictions
= function(learner_preds, iterations = 100L, measure = msr("classif.auc")) {
ges = 0
iteration = seq_along(learner_preds)
learner_num_ids = numeric(iterations)
performance = integer(iterations)
selected for (iteration in seq_len(iterations)) {
cat("Iteration: ", iteration, "\n")
= map_dbl(learner_num_ids, function(learner_num_id) {
performance_tmp # for each model: add it to the ensemble and calculate the average validation performance of the k folds by performing model averaging
# use get_ges_weights to calculate the weights for each model
= selected
selected_tmp = learner_num_id
selected_tmp[iteration] = get_ges_weights(...)
weights = map_dbl(seq_len(k), function(fold) {
performances # similar steps as in the ObjectiveRFun we constructed to optimize the weights numerically with CMA-ES
...
})mean(performances)
})cat("Performance if learner added: ", round(performance_tmp, 3), "\n")
if (!measure$minimize) {
= - performance_tmp
performance_tmp
}= ... # best model if added to the ensemble
select = performance_tmp[select]
performance[iteration] = select
selected[iteration]
}= which.min(performance)
best_iteration = get_ges_weights(selected[seq_len(best_iteration)], learner_num_ids = learner_num_ids)
weights if (!measure$minimize) {
= - performance
performance
}list(best_iteration = best_iteration, performance_best = performance[best_iteration], weights = weights, selected = selected, performance = performance)
}
= ges(list(dt_pred, kknn_pred, log_reg_pred))
ges_results
# configure the ensemble with the optimized weights and evaluate on the outer split
= grl$clone(deep = TRUE)
grl_weights_optimized_ges $param_set$set_values(.values = list(classifavg.weights = ...))
grl_weights_optimized_ges$id = paste0(grl_weights_optimized_ges$id, ".optimized_cmaes")
grl_weights_optimized_ges#########################################################################################
= benchmark_grid(task, list(grl, grl_weights_optimized_cmaes, grl_weights_optimized_ges, dt, kknn, log_reg, featureless), outer_split)
bg = benchmark(bg)
b $aggregate(msr("classif.auc"))
b
# compare the weights
$param_set$get_values()[["classifavg.weights"]]
grl_weights_optimized_cmaes$param_set$get_values()[["classifavg.weights"]] grl_weights_optimized_ges
2 Single- and Multilayer Stacking
2.1 Level 0
We now want to do stacking. Use six learners on level 0: A decision tree, k-NN (with k = 7), an elastic net (alpha = 0.5 and s = 0.1) with categorical features target encoded, naive bayes with categorical features target encoded, XGBoost (nrounds = 100) with categorical features target encoded and a random forest (num.trees = 100). Do not cross-validate on level 0 for now. Train on the task and inspect the output of the level 0 graph. Now combine level 0 with a logistic regression on level 1.
Proceed to benchmark the stacked ensemble against each level 0 learner and the random forest. Use 5-fold CV to evaluate the ROC AUC of the models.
Hint 1:
To wrap a learner within an ensemble and use its predictions, the learner_cvPipeOp
is essential. Its resampling.method
hyperparameter allows you to cross-validate the predictions or not. Similarly resampling.folds
allows you to specify the number of folds if predictions should be cross-validated. To combine the level 0 output, use the featureunion PipeOp
. The following gallery post is helpful: https://mlr-org.com/gallery/pipelines/2020-04-27-tuning-stacking/index.html
Hint 2:
= rsmp("cv", folds = ...)$instantiate(task)
resampling = lrn("classif.rpart")
dt $predict_type = "prob"
dt= lrn(..., k = ...)
kknn $predict_type = "prob"
kknn= lrn(..., alpha = ..., s = ...)
elnet $predict_type = "prob"
elnet= as_learner(po(...) %>>% ...)
elnet = lrn(...)
naive_bayes $predict_type = "prob"
naive_bayes= as_learner(po(...) %>>% ...)
naive_bayes = lrn(..., nrounds = ...)
xgboost = as_learner(po(...) %>>% ...)
xgboost $predict_type = "prob"
xgboost= lrn(..., num.trees = ..._
rf $predict_type = "prob"
rf
level0 = gunion(
list(
po("learner_cv", learner = ..., resampling.method = ...),
...
)%>>% po(...)
) $plot()
level0
level0_train_output = level0$train(task)
level0_train_output
level1 = lrn(..., id = "log_reg_out")
$predict_type = "prob"
level1
ensemble = as_learner(level0 %>>% level1)
$id = "Simple Stacked Ensemble Insample"
ensemble$graph$plot()
ensemble
bg = benchmark_grid(..., list(...), ...)
b = benchmark(bg)
autoplot(..., measure = ...)
2.2 Cross Validation
So far, we have not cross-validated the level 0 predictions. Create another stacked ensemble where you use 3-fold CV to cross-validate the level 0 predictions.
Proceed to benchmark the cross-validated stacked ensemble against the stacked ensemble, and the random forest. Use 5-fold CV to evaluate the ROC AUC of the models.
Hint 1:
You can essentially reuse most of your logic on how to build the graph from the previous exercise. To wrap a learner within an ensemble and use its predictions, the learner_cvPipeOp
is essential. Have a look at its resampling.method
parameter to see how to cross-validate level 0 predictions.
Hint 2:
= gunion(
level0 list(
po("learner_cv", learner = dt, resampling.method = "cv", resampling.folds = 3L),
.
.
.
)%>>% po("featureunion")
)
= as_learner(level0 %>>% level1)
ensemble_cv $id = "Simple Stacked Ensemble 3-fold CV"
ensemble_cv
= benchmark_grid(...)
bg = benchmark(...)
b autoplot(...)
2.3 Add the Original Features of the Task
So far we have dropped the original features of the task. Create another stacked ensemble where you additionally keep the original features of the task. Similarly to before, cross-validate the level 0 predictions.
Proceed to benchmark the cross-validated stacked ensemble with features passed through against the cross-validated stacked ensemble, stacked ensemble, and the random forest. Use 5-fold CV to evaluate the ROC AUC of the models.
Hint 1:
You can essentially reuse most of your logic on how to build the graph from the previous exercise. To keep the original features of the task in the output of level 0, use the nopPipeOp
along the learners.
Hint 2:
= gunion(
level0 list(
po("learner_cv", learner = dt, resampling.method = "cv", resampling.folds = 3L),
...po("nop")
)%>>% po("featureunion")
) = level0$train(task)[[1L]]
level0_train_output
level0_train_output# The task now also contains the original features of the task passed through
= as_learner(level0 %>>% level1)
ensemble_cv_pass $id = "Simple Stacked Ensemble 3-fold CV + Orig Features"
ensemble_cv_pass
= benchmark_grid(...)
bg = benchmark(...)
b autoplot(...)
2.4 Multilayer Stacking Ensemple
Finally, create a multi-layer stacking ensemble by passing the output of the current level 0 into a level 1 that is built similar as level 1. Similarly to before, keep the original features of the task and cross-validate the level 0 and level 1 predictions. Then use a logistic regression on the final level 2. Make sure that when you pass the original features of the task to the final level 2 that you do not also pass the cross-validated predictions from level 0.
Proceed to benchmark the this cross-validated multi-layer stacked ensemble with features passed through against the cross-validated stacked ensemble with features passed, the cross-validated stacked ensemble, stacked ensemble, and the random forest. Use 5-fold CV to evaluate the ROC AUC of the models.
Hint 1:
To make sure that when you pass the original features of the task to the final level 2 that you do not also pass the cross-validated predictions from level 1, use an appropriately configured selectPipeOp
.
Hint 2:
= gunion(
level0 list(
po("learner_cv", id = "dt1_0", learner = dt, resampling.method = "cv", resampling.folds = 3L),
.
.
.
)%>>% po("featureunion", id = "featureunion_0")
) = gunion(
level1 list(
po("learner_cv", id = "dt1_1", learner = dt, resampling.method = "cv", resampling.folds = 3L),
.
.
.po("nop", id = "nop_1") %>>% po("select", selector = selector_name(...)) # pass the names of the original features
)%>>% po("featureunion", id = "featureunion_1")
) = lrn("classif.log_reg", id = "log_reg_out")
level2 $predict_type = "prob"
level2
= as_learner(level0 %>>% level1 %>>% level2)
ensemble_multilayer_cv_pass $id = "Multilayer Stacked Ensemble 3-fold CV + Orig Features"
ensemble_multilayer_cv_pass$graph$plot()
ensemble_multilayer_cv_pass
= benchmark_grid(...)
bg = benchmark(...)
b autoplot(...)
Summary
We implemented Greedy Ensemble Selection from scratch. Then we looked at how to implement various ways of stacked ensemblings and benchmarked them.