library("mlr3verse")
= lrn("classif.svm", id = "svm", type = "C-classification") learner
Scope
We continue working with the Hyperband optimization algorithm (Li et al. 2018). The previous post used the number of boosting iterations of an XGBoost model as the resource. However, Hyperband is not limited to machine learning algorithms that are trained iteratively. The resource can also be the number of features, the training time of a model, or the size of the training data set. In this post, we will tune a support vector machine and use the size of the training data set as the fidelity parameter. The time to train a support vector machine and the performance increases with the size of the data set. This makes the data set size a suitable fidelity parameter for Hyperband. This is the second part of the Hyperband series. The first part can be found here Hyperband Series - Iterative Training. If you don’t know much about Hyperband, check out the first post which explains the algorithm in detail. We assume that you are already familiar with tuning in the mlr3 ecosystem. If not, you should start with the book chapter on optimization or the Hyperparameter Optimization on the Palmer Penguins Data Set post. A little knowledge about mlr3pipelines is beneficial but not necessary to understand the example.
Hyperparameter Optimization
In this post, we will optimize the hyperparameters of the support vector machine on the Sonar
data set. We begin by constructing a classification machine by setting type
to "C-classification"
.
The mlr3pipelines package features a PipeOp
for subsampling.
po("subsample")
PipeOp: <subsample> (not trained)
values: <frac=0.6321, stratify=FALSE, replace=FALSE>
Input channels <name [train type, predict type]>:
input [Task,Task]
Output channels <name [train type, predict type]>:
output [Task,Task]
The PipeOp
controls the size of the training data set with the frac
parameter. We connect the PipeOp
with the learner and get a GraphLearner
.
= as_learner(
graph_learner po("subsample") %>>%
learner )
The graph learner subsamples and then fits a support vector machine on the data subset. The parameter set of the graph learner is a combination of the parameter sets of the PipeOp
and learner.
as.data.table(graph_learner$param_set)[, .(id, lower, upper, levels)]
id lower upper levels
1: subsample.frac 0 Inf
2: subsample.stratify NA NA TRUE,FALSE
3: subsample.replace NA NA TRUE,FALSE
4: svm.cachesize -Inf Inf
5: svm.class.weights NA NA
---
15: svm.nu -Inf Inf
16: svm.scale NA NA
17: svm.shrinking NA NA TRUE,FALSE
18: svm.tolerance 0 Inf
19: svm.type NA NA C-classification,nu-classification
Next, we create the search space. We use TuneToken
to mark which hyperparameters should be tuned. We have to prefix the hyperparameters with the id of the PipeOps
. The subsample.frac
is the fidelity parameter that must be tagged with "budget"
in the search space. The data set size is increased from 3.7% to 100%. For the other hyperparameters, we took the search space for support vector machines from the Kuehn et al. (2018) article. This search space works for a wide range of data sets.
$param_set$set_values(
graph_learnersubsample.frac = to_tune(p_dbl(3^-3, 1, tags = "budget")),
svm.kernel = to_tune(c("linear", "polynomial", "radial")),
svm.cost = to_tune(1e-4, 1e3, logscale = TRUE),
svm.gamma = to_tune(1e-4, 1e3, logscale = TRUE),
svm.tolerance = to_tune(1e-4, 2, logscale = TRUE),
svm.degree = to_tune(2, 5)
)
Support vector machines often crash or never finish the training with certain hyperparameter configurations. We set a timeout of 30 seconds and a fallback learner to handle these cases.
$encapsulate = c(train = "evaluate", predict = "evaluate")
graph_learner$timeout = c(train = 30, predict = 30)
graph_learner$fallback = lrn("classif.featureless") graph_learner
Let’s create the tuning instance. We use the "none"
terminator because Hyperband controls the termination itself.
= ti(
instance task = tsk("sonar"),
learner = graph_learner,
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce"),
terminator = trm("none")
) instance
<TuningInstanceSingleCrit>
* State: Not optimized
* Objective: <ObjectiveTuning:subsample.svm_on_sonar>
* Search Space:
id class lower upper nlevels
1: subsample.frac ParamDbl 0.03703704 1.0000000 Inf
2: svm.cost ParamDbl -9.21034037 6.9077553 Inf
3: svm.degree ParamInt 2.00000000 5.0000000 4
4: svm.gamma ParamDbl -9.21034037 6.9077553 Inf
5: svm.kernel ParamFct NA NA 3
6: svm.tolerance ParamDbl -9.21034037 0.6931472 Inf
* Terminator: <TerminatorNone>
We load the Hyperband tuner
and set eta = 3
.
library("mlr3hyperband")
= tnr("hyperband", eta = 3) tuner
Using eta = 3
and a lower bound of 3.7% for the data set size, results in the following schedule. Configurations with the same data set size are evaluated in parallel.
Now we are ready to start the tuning.
$optimize(instance) tuner
The best model is a support vector machine with a polynomial kernel.
$result[, .(subsample.frac, svm.cost, svm.degree, svm.gamma, svm.kernel, svm.tolerance, classif.ce)] instance
subsample.frac svm.cost svm.degree svm.gamma svm.kernel svm.tolerance classif.ce
1: 1 1.871535 3 -2.60663 polynomial -4.573951 0.1491373
The archive contains all evaluated configurations. We look at the 8 configurations that were evaluated on the complete data set. The configuration with the best classification error on the full data set was sampled in bracket 2. The classification error was estimated to be 26% on 33% of the data set and increased to 19% on the full data set (see green line in Figure 1).
Conclusion
Using the data set size as the budget parameter in Hyperband allows the tuning of machine learning models that are not trained iteratively. We have tried to keep the runtime of the example low. For your optimization, you should use cross-validation and run multiple iterations of Hyperband.