Tuning a Stacked Learner

mlr3pipelines mlr3tuning tuning optimization nested resampling stacking sonar data set classification

How to create and tune a multilevel stacking model using the mlr3pipelines package.

Milan Dragicevic , Giuseppe Casalicchio


Multilevel stacking is an ensemble technique, where predictions of several learners are added as new features to extend the orginal data on different levels. On each level, the extended data is used to train a new level of learners. This can be repeated for several iterations until a final learner is trained. To avoid overfitting, it is advisable to use test set (out-of-bag) predictions in each level.

In this post, a multilevel stacking example will be created using mlr3pipelines and tuned using mlr3tuning . A similar example is available in the mlr3book. However, we additionally explain how to tune the hyperparameters of the whole ensemble and each underlying learner jointly.

In our stacking example, we proceed as follows:

  1. Level 0: Based on the input data, we train three learners (rpart, glmnet and lda) on a sparser feature space obtained using different feature filter methods from mlr3filters to obtain slightly decorrelated predictions. The test set predictions of these learners are attached to the original data (used in level 0) and will serve as input for the learners in level 1.
  2. Level 1: We transform this extended data using PCA, on which we then train additional three learners (rpart, glmnet and lda). The test set predictions of the level 1 learners are attached to input data used in level 1.
  3. Finally, we train a final ranger learner to the data extended by level 1. Note that the number of features selected by the feature filter method in level 0 and the number of principal components retained in level 1 will be jointly tuned with some other hyperparameters of the learners in each level.


We load the mlr3verse package which pulls in the most important packages for this example. The mlr3learners package loads additional learners.

We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented.


For the stacking example, we use the sonar classification task:

task_sonar = tsk("sonar")
task_sonar$col_roles$stratum = task_sonar$target_names #stratification

Pipeline creation

Level 0

As mentioned, the level 0 learners are rpart, glmnet and lda:

learner_rpart  = lrn("classif.rpart", predict_type = "prob")
learner_glmnet =  lrn("classif.glmnet", predict_type = "prob")
learner_lda = lrn("classif.lda", predict_type = "prob")

To create the learner out-of-bag predictions, we use PipeOpLearnerCV:

cv1_rpart = po("learner_cv", learner_rpart, id = "rprt_1")
cv1_glmnet = po("learner_cv", learner_glmnet, id = "glmnet_1")
cv1_lda = po("learner_cv", learner_lda, id = "lda_1")

A sparser representation of the input data in level 0 is obtained using the following filters:

anova = po("filter", flt("anova"), id = "filt1")
mrmr = po("filter", flt("mrmr"), id = "filt2")
find_cor = po("filter", flt("find_correlation"), id = "filt3")

To summarize these steps into level 0, we use the gunion() function. The out-of-bag predictions of all level 0 learners is attached using PipeOpFeatureUnion along with the original data passed via PipeOpNOP:

level0 = gunion(list(
  anova %>>% cv1_rpart,
  mrmr %>>% cv1_glmnet,
  find_cor %>>% cv1_lda,
  po("nop", id = "nop1")))  %>>%
  po("featureunion", id = "union1")

We can have a look at the graph from level 0:


Level 1

Now, we create the level 1 learners:

cv2_rpart = po("learner_cv", learner_rpart , id = "rprt_2")
cv2_glmnet = po("learner_cv", learner_glmnet, id = "glmnet_2")
cv2_lda = po("learner_cv", learner_lda, id = "lda_2")

All level 1 learners will use PipeOpPCA transformed data as input:

level1 = level0 %>>%
  po("copy", 4) %>>%
    po("pca", id = "pca2_1", param_vals = list(scale. = TRUE)) %>>% cv2_rpart,
    po("pca", id = "pca2_2", param_vals = list(scale. = TRUE)) %>>% cv2_glmnet,
    po("pca", id = "pca2_3", param_vals = list(scale. = TRUE)) %>>% cv2_lda,
    po("nop", id = "nop2"))
  )  %>>%
  po("featureunion", id = "union2")

We can have a look at the graph from level 1:


The out-of-bag predictions of the level 1 learners are attached to the input data from level 1 and a final ranger learner will be trained:

ranger_lrn = lrn("classif.ranger", predict_type = "prob")

ensemble = level1 %>>% ranger_lrn

Defining the tuning space

In order to tune the ensemble’s hyperparameter jointly, we define the search space using ParamSet from the paradox package:

search_space_ensemble = ps(
    filt1.filter.nfeat = p_int(5, 50),
    filt2.filter.nfeat = p_int(5, 50),
    filt3.filter.nfeat = p_int(5, 50),
    pca2_1.rank. = p_int(3, 50),
    pca2_2.rank. = p_int(3, 50),
    pca2_3.rank. = p_int(3, 20),
    rprt_1.cp = p_dbl(0.001, 0.1),
    rprt_1.minbucket = p_int(1, 10),
    glmnet_1.alpha = p_dbl(0, 1),
    rprt_2.cp = p_dbl(0.001, 0.1),
    rprt_2.minbucket = p_int(1, 10),
    glmnet_2.alpha = p_dbl(0, 1),
    classif.ranger.mtry = p_int(1, 10),
    classif.ranger.sample.fraction = p_dbl(0.5, 1),
    classif.ranger.num.trees = p_int(50, 200))

Performance comparison

Even with a simple ensemble, there is quite a few things to setup. We compare the performance of the ensemble with a simple tuned ranger learner.

To proceed, we convert the ensemble pipeline as a GraphLearner:

learner_ensemble = as_learner(ensemble)
learner_ensemble$id = "ensemble"
learner_ensemble$predict_type = "prob"

We define the search space for the simple ranger learner:

search_space_ranger = ps(
  mtry = p_int(1, 10),
  sample.fraction = p_dbl(0.5, 1),
  num.trees = p_int(50, 200))

For performance comparison, we use the benchmark() function that requires a design incorporating a list of learners and a list of tasks. Here, we have two learners (the simple ranger learner and the ensemble) and one task. Since we want to tune the simple ranger learner as well as the whole ensemble learner, we need to create an AutoTuner for each learner to be compared. To do so, we need to define a resampling strategy for the tuning in the inner loop (we use 3-fold cross-validation) and for the final evaluation (outer loop) use use holdout validation:

inner_resampling = rsmp("cv", folds = 3)

# AutoTuner for the ensemble learner
at_1 = auto_tuner(
  method = "random_search",
  learner = learner_ensemble,
  resampling = inner_resampling,
  measure = msr("classif.auc"),
  search_space = search_space_ensemble,
  term_evals = 3) # to limit running time

# AutoTuner for the simple ranger learner
at_2 = auto_tuner(
  method = "random_search",
  learner = ranger_lrn,
  resampling = inner_resampling,
  measure = msr("classif.auc"),
  search_space = search_space_ranger,
  term_evals = 3) # to limit running time

# Define the list of learners
learners = list(at_1, at_2)

# For benchmarking, we use a simple holdout
outer_resampling = rsmp("holdout")

design = benchmark_grid(
  tasks = task_sonar,
  learners = learners,
  resamplings = outer_resampling
bmr = benchmark(design, store_models = TRUE)
nr task_id learner_id resampling_id iters classif.auc
1 sonar ensemble.tuned holdout 1 0.8927365
2 sonar classif.ranger.tuned holdout 1 0.9070946

For a more reliable comparison, the number of evaluation of the random search should be increased.


This example shows the versatility of mlr3pipelines. By using more learners, varied representations of the data set as well as more levels, a powerful yet compute hungry pipeline can be created. It is important to note that care should be taken to avoid name clashes of pipeline objects.


For attribution, please cite this work as

Dragicevic & Casalicchio (2020, April 27). mlr-org: Tuning a Stacked Learner. Retrieved from https://mlr-org.github.io/mlr-org-website/gallery/2020-04-27-tuning-stacking/

BibTeX citation

  author = {Dragicevic, Milan and Casalicchio, Giuseppe},
  title = {mlr-org: Tuning a Stacked Learner},
  url = {https://mlr-org.github.io/mlr-org-website/gallery/2020-04-27-tuning-stacking/},
  year = {2020}