Introduction to mlr3tuningspaces

Apply predefined search spaces from scientific articles.


Marc Becker


July 6, 2021


The package mlr3tuningspaces offers a selection of published search spaces for many popular machine learning algorithms. In this post, we show how to tune a mlr3 learners with these search spaces.


The packages mlr3verse and mlr3tuningspaces are required for this demonstration:


We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented.


In the example, we use the pima indian diabetes data set which is used to predict whether or not a patient has diabetes. The patients are characterized by 8 numeric features, some of them have missing values.

# retrieve the task from mlr3
task = tsk("pima")

# generate a quick textual overview using the skimr package
Data summary
Name task$data()
Number of rows 768
Number of columns 9
Column type frequency:
factor 1
numeric 8
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
diabetes 0 1 FALSE 2 neg: 500, pos: 268

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 33.24 11.76 21.00 24.00 29.00 41.00 81.00 ▇▃▁▁▁
glucose 5 0.99 121.69 30.54 44.00 99.00 117.00 141.00 199.00 ▁▇▇▃▂
insulin 374 0.51 155.55 118.78 14.00 76.25 125.00 190.00 846.00 ▇▂▁▁▁
mass 11 0.99 32.46 6.92 18.20 27.50 32.30 36.60 67.10 ▅▇▃▁▁
pedigree 0 1.00 0.47 0.33 0.08 0.24 0.37 0.63 2.42 ▇▃▁▁▁
pregnant 0 1.00 3.85 3.37 0.00 1.00 3.00 6.00 17.00 ▇▃▂▁▁
pressure 35 0.95 72.41 12.38 24.00 64.00 72.00 80.00 122.00 ▁▃▇▂▁
triceps 227 0.70 29.15 10.48 7.00 22.00 29.00 36.00 99.00 ▆▇▁▁▁

Tuning Search Space

For tuning, it is important to create a search space that defines the type and range of the hyperparameters. A learner stores all information about its hyperparameters in the slot $param_set. Usually, we have to chose a subset of hyperparameters we want to tune.

                id    class lower upper nlevels        default value
 1:             cp ParamDbl     0     1     Inf           0.01      
 2:     keep_model ParamLgl    NA    NA       2          FALSE      
 3:     maxcompete ParamInt     0   Inf     Inf              4      
 4:       maxdepth ParamInt     1    30      30             30      
 5:   maxsurrogate ParamInt     0   Inf     Inf              5      
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]>      
 7:       minsplit ParamInt     1   Inf     Inf             20      
 8: surrogatestyle ParamInt     0     1       2              0      
 9:   usesurrogate ParamInt     0     2       3              2      
10:           xval ParamInt     0   Inf     Inf             10     0


At the heart of mlr3tuningspaces is the R6 class TuningSpace. It stores a list of TuneToken, helper functions and additional meta information. The list of TuneToken can be directly applied to the $values slot of a learner’s ParamSet. The search spaces are stored in the mlr_tuning_spaces dictionary.
                        key                                 label         learner n_values
 1:  classif.glmnet.default       Classification GLM with Default  classif.glmnet        2
 2:     classif.glmnet.rbv1     Classification GLM with RandomBot  classif.glmnet        2
 3:     classif.glmnet.rbv2     Classification GLM with RandomBot  classif.glmnet        2
 4:    classif.kknn.default      Classification KKNN with Default    classif.kknn        3
 5:       classif.kknn.rbv1    Classification KKNN with RandomBot    classif.kknn        1
 6:       classif.kknn.rbv2    Classification KKNN with RandomBot    classif.kknn        1
 7:  classif.ranger.default    Classification Ranger with Default  classif.ranger        4
 8:     classif.ranger.rbv1  Classification Ranger with RandomBot  classif.ranger        6
 9:     classif.ranger.rbv2  Classification Ranger with RandomBot  classif.ranger        8
10:   classif.rpart.default     Classification Rpart with Default   classif.rpart        3
11:      classif.rpart.rbv1   Classification Rpart with RandomBot   classif.rpart        4
12:      classif.rpart.rbv2   Classification Rpart with RandomBot   classif.rpart        4
13:     classif.svm.default       Classification SVM with Default     classif.svm        4
14:        classif.svm.rbv1     Classification SVM with RandomBot     classif.svm        4
15:        classif.svm.rbv2     Classification SVM with RandomBot     classif.svm        5
16: classif.xgboost.default   Classification XGBoost with Default classif.xgboost        8
17:    classif.xgboost.rbv1 Classification XGBoost with RandomBot classif.xgboost       10
18:    classif.xgboost.rbv2 Classification XGBoost with RandomBot classif.xgboost       13
19:     regr.glmnet.default           Regression GLM with Default     regr.glmnet        2
20:        regr.glmnet.rbv1         Regression GLM with RandomBot     regr.glmnet        2
21:        regr.glmnet.rbv2         Regression GLM with RandomBot     regr.glmnet        2
22:       regr.kknn.default          Regression KKNN with Default       regr.kknn        3
23:          regr.kknn.rbv1        Regression KKNN with RandomBot       regr.kknn        1
24:          regr.kknn.rbv2        Regression KKNN with RandomBot       regr.kknn        1
25:     regr.ranger.default        Regression Ranger with Default     regr.ranger        4
26:        regr.ranger.rbv1      Regression Ranger with RandomBot     regr.ranger        6
27:        regr.ranger.rbv2      Regression Ranger with RandomBot     regr.ranger        7
28:      regr.rpart.default         Regression Rpart with Default      regr.rpart        3
29:         regr.rpart.rbv1       Regression Rpart with RandomBot      regr.rpart        4
30:         regr.rpart.rbv2       Regression Rpart with RandomBot      regr.rpart        4
31:        regr.svm.default           Regression SVM with Default        regr.svm        4
32:           regr.svm.rbv1         Regression SVM with RandomBot        regr.svm        4
33:           regr.svm.rbv2         Regression SVM with RandomBot        regr.svm        5
34:    regr.xgboost.default       Regression XGBoost with Default    regr.xgboost        8
35:       regr.xgboost.rbv1     Regression XGBoost with RandomBot    regr.xgboost       10
36:       regr.xgboost.rbv2     Regression XGBoost with RandomBot    regr.xgboost       13
                        key                                 label         learner n_values

We can use the sugar function lts() to retrieve a TuningSpace.

tuning_space_rpart = lts("classif.rpart.default")
<TuningSpace:classif.rpart.default>: Classification Rpart with Default
          id lower upper levels logscale
1:  minsplit 2e+00 128.0            TRUE
2: minbucket 1e+00  64.0            TRUE
3:        cp 1e-04   0.1            TRUE

The $values slot contains the list of of TuneToken.

Tuning over:
range [2, 128] (log scale)

Tuning over:
range [1, 64] (log scale)

Tuning over:
range [1e-04, 0.1] (log scale)

We apply the search space and tune the learner.

learner = lrn("classif.rpart")

learner$param_set$values = tuning_space_rpart$values

instance = tune(
  tuner = tnr("random_search"),
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp ("holdout"),
  measure = msr("classif.ce"),
  term_evals = 10)

   minsplit minbucket        cp learner_param_vals  x_domain classif.ce
1:  3.40059  1.963618 -4.114895          <list[3]> <list[3]>  0.2539062

We can also get the learner with search space already applied from the TuningSpace.

learner = tuning_space_rpart$get_learner()
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                    
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0

This method also allows to set constant parameters.

learner = tuning_space_rpart$get_learner(maxdepth = 15)
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                  15
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0

The lts() function sets the default search space directly to a learner.

learner = lts(lrn("classif.rpart", maxdepth = 15))
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                  15
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0