Practical Tuning Series - Tuning and Parallel Processing

Run various jobs in mlr3 in parallel.

Authors

Marc Becker

Theresa Ullmann

Michel Lang

Bernd Bischl

Jakob Richter

Martin Binder

Published

March 12, 2021

Scope

This is the fourth part of the practical tuning series. The other parts can be found here:

In this post, we teach how to run various jobs in mlr3 in parallel. The goal is to map computational jobs (e.g. evaluation of one configuration) to a pool of workers (usually physical CPU cores, sometimes remote computational nodes) to reduce the run time needed for tuning.

Prerequisites

We load the mlr3verse package which pulls in the most important packages for this example. Additionally, make sure you have installed the packages future and future.apply.

library(mlr3verse)

We decrease the verbosity of the logger to keep the output clearly represented. The lgr package is used for logging in all mlr3 packages. The mlr3 logger prints the logging messages from the base package, whereas the bbotk logger is responsible for logging messages from the optimization packages (e.g. mlr3tuning ).

set.seed(7832)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

Parallel Backend

The workers are specified by the parallel backend which orchestrates starting up, shutting down, and communication with the workers. On a single machine, multisession and multicore are common backends. The multisession backend spawns new background R processes. It is available on all platforms.

future::plan("multisession")

The multicore backend uses forked R processes which allows the workers to access R objects in a shared memory. This reduces the overhead since R objects are only copied in memory if they are modified. Unfortunately, forking processes is not supported on Windows and when running R from within RStudio.

future::plan("multicore")

Both backends support the workers argument that specifies the number of used cores.

Use this code if your code should run with the multicore backend when possible.

if (future::supportsMulticore()) {
  future::plan(future::multicore)
} else {
  future::plan(future::multisession)
}

Resampling

The resample() and benchmark() functions in mlr3 can be executed in parallel. The parallelization is triggered by simply declaring a plan via future::plan().

future::plan("multisession")

task = tsk("pima")
learner = lrn("classif.rpart") # classification tree
resampling = rsmp("cv", folds = 3)

resample(task, learner, resampling)
<ResampleResult> with 3 resampling iterations
 task_id    learner_id resampling_id iteration warnings errors
    pima classif.rpart            cv         1        0      0
    pima classif.rpart            cv         2        0      0
    pima classif.rpart            cv         3        0      0

The 3-fold cross-validation gives us 3 jobs since each resampling iteration is executed in parallel.

The benchmark() function accepts a design of experiments as input where each experiment is defined as a combination of a task, a learner, and a resampling strategy. For each experiment, resampling is performed. The nested loop over experiments and resampling iterations is flattened so that all resampling iterations of all experiments can be executed in parallel.

future::plan("multisession")

tasks = list(tsk("pima"), tsk("iris"))
learner = lrn("classif.rpart")
resampling = rsmp("cv", folds = 3)

grid = benchmark_grid(tasks, learner, resampling)

benchmark(grid)
<BenchmarkResult> of 6 rows with 2 resampling runs
 nr task_id    learner_id resampling_id iters warnings errors
  1    pima classif.rpart            cv     3        0      0
  2    iris classif.rpart            cv     3        0      0

The 2 experiments and the 3-fold cross-validation result in 6 jobs which are executed in parallel.

Tuning

The mlr3tuning package internally calls benchmark() during tuning. If the tuner is capable of suggesting multiple configurations per iteration (such as random search, grid search, or hyperband), these configurations represent individual experiments, and the loop flattening of benchmark() is triggered. E.g., all resampling iterations of all hyperparameter configurations on a grid can be executed in parallel.

future::plan("multisession")

learner = lrn("classif.rpart")
learner$param_set$values$cp = to_tune(0.001, 0.1)
learner$param_set$values$minsplit = to_tune(1, 10)

instance = tune(
  tuner = tnr("random_search", batch_size = 5), # random search suggests 5 configurations per batch
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)

The batch size of 5 and the 3-fold cross-validation gives us 15 jobs. This is done twice because of the limit of 10 evaluations in total.

Nested Resampling

Nested resampling results in two nested resampling loops. For this, an AutoTuner is passed to resample() or benchmark(). We can choose different parallelization backends for the inner and outer resampling loop, respectively. We just have to pass a list of backends.

# Runs the outer loop in parallel and the inner loop sequentially
future::plan(list("multisession", "sequential"))

# Runs the outer loop sequentially and the inner loop in parallel
future::plan(list("sequential", "multisession"))

learner = lrn("classif.rpart")
learner$param_set$values$cp = to_tune(0.001, 0.1)
learner$param_set$values$minsplit = to_tune(1, 10)

rr = tune_nested(
  tuner = tnr("random_search", batch_size = 5), # random search suggests 5 configurations per batch
  task = tsk("pima"),
  learner = learner,
  inner_resampling = rsmp ("cv", folds = 3),
  outer_resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)

While nesting real parallelization backends is often unintended and causes unnecessary overhead, it is useful in some distributed computing setups. It can be achieved with future by forcing a fixed number of workers for each loop.

# Runs both loops in parallel
future::plan(list(future::tweak("multisession", workers = 2),
                  future::tweak("multisession", workers = 4)))

This example would run on 8 cores (= 2 * 4) on the local machine.

Resources

The mlr3book includes a chapters on parallelization. The mlr3cheatsheets contain frequently used commands and workflows of mlr3.