Practical Tuning Series - Tuning and Parallel Processing

Run various jobs in mlr3 in parallel.

Authors

Marc Becker

Theresa Ullmann

Michel Lang

Bernd Bischl

Jakob Richter

Martin Binder

Published

March 12, 2021

Scope

This is the fourth part of the practical tuning series. The other parts can be found here:

In this post, we teach how to run various jobs in mlr3 in parallel. The goal is to map computational jobs (e.g. evaluation of one configuration) to a pool of workers (usually physical CPU cores, sometimes remote computational nodes) to reduce the run time needed for tuning.

Prerequisites

We load the mlr3verse package which pulls in the most important packages for this example. Additionally, make sure you have installed the packages future and future.apply.

library(mlr3verse)

We decrease the verbosity of the logger to keep the output clearly represented. The lgr package is used for logging in all mlr3 packages. The mlr3 logger prints the logging messages from the base package, whereas the bbotk logger is responsible for logging messages from the optimization packages (e.g. mlr3tuning ).

set.seed(7832)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

Parallel Backend

The workers are specified by the parallel backend which orchestrates starting up, shutting down, and communication with the workers. On a single machine, multisession and multicore are common backends. The multisession backend spawns new background R processes. It is available on all platforms.

future::plan("multisession")

The multicore backend uses forked R processes which allows the workers to access R objects in a shared memory. This reduces the overhead since R objects are only copied in memory if they are modified. Unfortunately, forking processes is not supported on Windows and when running R from within RStudio.

future::plan("multicore")

Both backends support the workers argument that specifies the number of used cores.

Use this code if your code should run with the multicore backend when possible.

if (future::supportsMulticore()) {
  future::plan(future::multicore)
} else {
  future::plan(future::multisession)
}

Resampling

The resample() and benchmark() functions in mlr3 can be executed in parallel. The parallelization is triggered by simply declaring a plan via future::plan().

future::plan("multisession")

task = tsk("pima")
learner = lrn("classif.rpart") # classification tree
resampling = rsmp("cv", folds = 3)

resample(task, learner, resampling)

── <ResampleResult> with 3 resampling iterations ───────────────────────────────────────────────────────────────────────
 task_id    learner_id resampling_id iteration     prediction_test warnings errors
    pima classif.rpart            cv         1 <PredictionClassif>        0      0
    pima classif.rpart            cv         2 <PredictionClassif>        0      0
    pima classif.rpart            cv         3 <PredictionClassif>        0      0

The 3-fold cross-validation gives us 3 jobs since each resampling iteration is executed in parallel.

The benchmark() function accepts a design of experiments as input where each experiment is defined as a combination of a task, a learner, and a resampling strategy. For each experiment, resampling is performed. The nested loop over experiments and resampling iterations is flattened so that all resampling iterations of all experiments can be executed in parallel.

future::plan("multisession")

tasks = list(tsk("pima"), tsk("iris"))
learner = lrn("classif.rpart")
resampling = rsmp("cv", folds = 3)

grid = benchmark_grid(tasks, learner, resampling)

benchmark(grid)

── <BenchmarkResult> of 6 rows with 2 resampling run ───────────────────────────────────────────────────────────────────
 nr task_id    learner_id resampling_id iters warnings errors
  1    pima classif.rpart            cv     3        0      0
  2    iris classif.rpart            cv     3        0      0

The 2 experiments and the 3-fold cross-validation result in 6 jobs which are executed in parallel.

Tuning

The mlr3tuning package internally calls benchmark() during tuning. If the tuner is capable of suggesting multiple configurations per iteration (such as random search, grid search, or hyperband), these configurations represent individual experiments, and the loop flattening of benchmark() is triggered. E.g., all resampling iterations of all hyperparameter configurations on a grid can be executed in parallel.

future::plan("multisession")

learner = lrn("classif.rpart")
learner$param_set$values$cp = to_tune(0.001, 0.1)
learner$param_set$values$minsplit = to_tune(1, 10)

instance = tune(
  tuner = tnr("random_search", batch_size = 5), # random search suggests 5 configurations per batch
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)

The batch size of 5 and the 3-fold cross-validation gives us 15 jobs. This is done twice because of the limit of 10 evaluations in total.

Nested Resampling

Nested resampling results in two nested resampling loops. For this, an AutoTuner is passed to resample() or benchmark(). We can choose different parallelization backends for the inner and outer resampling loop, respectively. We just have to pass a list of backends.

# Runs the outer loop in parallel and the inner loop sequentially
future::plan(list("multisession", "sequential"))

# Runs the outer loop sequentially and the inner loop in parallel
future::plan(list("sequential", "multisession"))

learner = lrn("classif.rpart")
learner$param_set$values$cp = to_tune(0.001, 0.1)
learner$param_set$values$minsplit = to_tune(1, 10)

rr = tune_nested(
  tuner = tnr("random_search", batch_size = 5), # random search suggests 5 configurations per batch
  task = tsk("pima"),
  learner = learner,
  inner_resampling = rsmp ("cv", folds = 3),
  outer_resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)
Warning: SequentialFuture ('future_mapply-1') added, removed, or modified connections. A future expression must close
any opened connections and must not close connections it did not open. Details: 12 connection added ([index= 5,
description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened, can.read=yes, can.write=yes];
[index= 6, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened, can.read=yes,
can.write=yes]; [index= 7, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened,
can.read=yes, can.write=yes]; [index= 8, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary,
opened=opened, can.read=yes, can.write=yes]; [index= 9, description=<-localhost:11967, class=sockconn, mode=a+b,
text=binary, opened=opened, can.read=yes, can.write=yes]; [index=10, description=<-localhost:11967, class=sockconn,
mode=a+b, text=binary, opened=opened, can.read=yes, can.write=yes]; [index=11, description=<-localhost:11967,
class=sockconn, mode=a+b, text=binary, opened=opened, can.read=yes, can.write=yes]; [index=12,
description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened, can.read=yes, can.write=yes];
[index=13, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened, can.read=yes,
can.write=yes]; [index=14, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary, opened=opened,
can.read=yes, can.write=yes]; [index=15, description=<-localhost:11967, class=sockconn, mode=a+b, text=binary,
opened=opened, can.read=yes, can.write=yes]; [index=16, description=<-localhost:11967, class=sockconn, mode=a+b,
text=binary, opened=opened, can.read=yes, can.write=yes]), 0 connection removed (<none>), 0 connection replaced
(<none>). See also help("future.options", package = "future") [future 'future_mapply-1'
(f8f4958a0ee670a5f4b85e9bd400ff97-40); on f8f4958a0ee670a5f4b85e9bd400ff97@5be2c2c9e0c1<31471>]

While nesting real parallelization backends is often unintended and causes unnecessary overhead, it is useful in some distributed computing setups. It can be achieved with future by forcing a fixed number of workers for each loop.

# Runs both loops in parallel
future::plan(list(future::tweak("multisession", workers = 2),
                  future::tweak("multisession", workers = 4)))

This example would run on 8 cores (= 2 * 4) on the local machine.

Resources

The mlr3book includes a chapters on parallelization. The mlr3cheatsheets contain frequently used commands and workflows of mlr3.

Session Information

sessioninfo::session_info(info = "packages")
═ Session info ═══════════════════════════════════════════════════════════════════════════════════════════════════════
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package           * version    date (UTC) lib source
 backports           1.5.0      2024-05-23 [1] RSPM
 bbotk               1.8.1      2025-11-26 [1] RSPM
 checkmate           2.3.4      2026-02-03 [1] RSPM
 class               7.3-23     2025-01-01 [2] CRAN (R 4.5.2)
 cli                 3.6.5      2025-04-23 [1] RSPM
 cluster             2.1.8.1    2025-03-12 [2] CRAN (R 4.5.2)
 codetools           0.2-20     2024-03-31 [2] CRAN (R 4.5.2)
 crayon              1.5.3      2024-06-20 [1] RSPM
 data.table        * 1.18.2.1   2026-01-27 [1] RSPM
 DEoptimR            1.1-4      2025-07-27 [1] RSPM
 digest              0.6.39     2025-11-19 [1] RSPM
 diptest             0.77-2     2025-08-20 [1] RSPM
 dplyr               1.2.0      2026-02-03 [1] RSPM
 evaluate            1.0.5      2025-08-27 [1] RSPM
 farver              2.1.2      2024-05-13 [1] RSPM
 fastmap             1.2.0      2024-05-15 [1] RSPM
 flexmix             2.3-20     2025-02-28 [1] RSPM
 fpc                 2.2-14     2026-01-14 [1] RSPM
 future            * 1.69.0     2026-01-16 [1] RSPM
 future.apply        1.20.2     2026-02-20 [1] RSPM
 generics            0.1.4      2025-05-09 [1] RSPM
 ggplot2             4.0.2      2026-02-03 [1] RSPM
 globals             0.19.0     2026-02-02 [1] RSPM
 glue                1.8.0      2024-09-30 [1] RSPM
 gtable              0.3.6      2024-10-25 [1] RSPM
 htmltools           0.5.9      2025-12-04 [1] RSPM
 htmlwidgets         1.6.4      2023-12-06 [1] RSPM
 jsonlite            2.0.0      2025-03-27 [1] RSPM
 kernlab             0.9-33     2024-08-13 [1] RSPM
 knitr               1.51       2025-12-20 [1] RSPM
 lattice             0.22-7     2025-04-02 [2] CRAN (R 4.5.2)
 lgr                 0.5.2      2026-01-30 [1] RSPM
 lifecycle           1.0.5      2026-01-08 [1] RSPM
 listenv             0.10.0     2025-11-02 [1] RSPM
 magrittr            2.0.4      2025-09-12 [1] RSPM
 MASS                7.3-65     2025-02-28 [2] CRAN (R 4.5.2)
 Matrix              1.7-4      2025-08-28 [2] CRAN (R 4.5.2)
 mclust              6.1.2      2025-10-31 [1] RSPM
 mlr3              * 1.4.0      2026-02-19 [1] RSPM
 mlr3cluster         0.2.0      2026-02-04 [1] RSPM
 mlr3cmprsk          0.0.1      2026-02-27 [1] Github (mlr-org/mlr3cmprsk@5a04c29)
 mlr3data            0.9.0      2024-11-08 [1] RSPM
 mlr3extralearners   1.4.0      2026-01-26 [1] https://m~
 mlr3filters         0.9.0      2025-09-12 [1] RSPM
 mlr3fselect       * 1.5.0      2025-11-27 [1] RSPM
 mlr3hyperband       1.0.0      2025-07-10 [1] RSPM
 mlr3inferr          0.2.1      2025-11-26 [1] RSPM
 mlr3learners        0.14.0     2025-12-13 [1] RSPM
 mlr3mbo             0.3.3      2025-10-10 [1] RSPM
 mlr3measures        1.2.0      2025-11-25 [1] RSPM
 mlr3misc            0.21.0     2026-02-26 [1] RSPM
 mlr3pipelines       0.10.0     2025-11-07 [1] RSPM
 mlr3tuning          1.5.1      2025-12-14 [1] RSPM
 mlr3tuningspaces    0.6.0      2025-05-16 [1] RSPM
 mlr3verse         * 0.3.1      2025-01-14 [1] RSPM
 mlr3viz             0.11.0     2026-02-22 [1] RSPM
 mlr3website       * 0.0.0.9000 2026-02-27 [1] Github (mlr-org/mlr3website@f6e32a7)
 modeltools          0.2-24     2025-05-02 [1] RSPM
 nnet                7.3-20     2025-01-01 [2] CRAN (R 4.5.2)
 otel                0.2.0      2025-08-29 [1] RSPM
 palmerpenguins      0.1.1      2022-08-15 [1] RSPM
 paradox             1.0.1      2024-07-09 [1] RSPM
 parallelly          1.46.1     2026-01-08 [1] RSPM
 pillar              1.11.1     2025-09-17 [1] RSPM
 pkgconfig           2.0.3      2019-09-22 [1] RSPM
 prabclus            2.3-5      2026-01-14 [1] RSPM
 R6                  2.6.1      2025-02-15 [1] RSPM
 RColorBrewer        1.1-3      2022-04-03 [1] RSPM
 Rcpp                1.1.1      2026-01-10 [1] RSPM
 rlang               1.1.7      2026-01-09 [1] RSPM
 rmarkdown           2.30       2025-09-28 [1] RSPM
 robustbase          0.99-7     2026-02-05 [1] RSPM
 rpart               4.1.24     2025-01-07 [2] CRAN (R 4.5.2)
 S7                  0.2.1      2025-11-14 [1] RSPM
 scales              1.4.0      2025-04-24 [1] RSPM
 sessioninfo         1.2.3      2025-02-05 [1] RSPM
 spacefillr          0.4.0      2025-02-24 [1] RSPM
 stringi             1.8.7      2025-03-27 [1] RSPM
 survdistr           0.0.1      2026-02-27 [1] Github (mlr-org/survdistr@d7babd1)
 survival            3.8-3      2024-12-17 [2] CRAN (R 4.5.2)
 tibble              3.3.1      2026-01-11 [1] RSPM
 tidyselect          1.2.1      2024-03-11 [1] RSPM
 uuid                1.2-2      2026-01-23 [1] RSPM
 vctrs               0.7.1      2026-01-23 [1] RSPM
 withr               3.0.2      2024-10-28 [1] RSPM
 xfun                0.56       2026-01-18 [1] RSPM
 yaml                2.3.12     2025-12-10 [1] RSPM

 [1] /usr/local/lib/R/site-library
 [2] /usr/local/lib/R/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────