mlr3tuning - Runtime and Memory Benchmarks

Scope

This report analyzes the runtime and memory usage of the mlr3tuning package across different versions. The benchmarks include the tune() and tune_nested() functions both in sequential and parallel mode. The benchmarks vary the training time of the models and the size of the dataset.

Given the extensive package ecosystem of mlr3, performance bottlenecks can occur at multiple stages. This report aims to help users determine whether the runtime of their workflows falls within expected ranges. If significant runtime or memory anomalies are observed, users are encouraged to report them by opening a GitHub issue.

Benchmarks are conducted on a high-performance cluster optimized for multi-core performance rather than single-core speed. Consequently, runtimes may be faster on a local machine.

Summary of Latest mlr3tuning Version

The benchmarks are comprehensive; therefore, we present a summary of the results for the latest mlr3tuning version. We measure the runtime and memory usage of a random search with 1000 resampling iterations on the spam dataset with 1000 and 10,000 instances. The nested resampling is conducted with 10 outer resampling iterations and uses the same random search for the inner resampling loop. The overhead introduced by tune() and tune_nested() should always be considered relative to the training time of the models. For models with longer training times, such as 1 second, the overhead is minimal. For models with a training time of 100 ms, the overhead is approximately 20%. For models with a training time of 10 ms, the overhead approximately doubles or triples the runtime. In cases where the training time is only 1 ms, the overhead results in the runtime being 15 to 25 times larger than the actual model training time. The memory usage of tune() and tune_nested() is between 370 MB and 670 MB. Running an empty R session consumes 131 MB of memory.

mlr3tuning utilizes the future package to enable parallelization over resampling iterations. However, running tune() and tune_nested() in parallel introduces overhead due to the initiation of worker processes. Therefore, we compare the runtime of parallel execution with that of sequential execution. For models with a 1-second, 100 ms, and 10 ms training time, using tune() in parallel reduces runtime. For models 1 ms training times, sequential execution becomes slower than parallel execution. Memory usage increases significantly with the number of cores since each core initiates a separate R session. Utilizing 10 cores results in a total memory usage of around 1.5 GB. The tune_nested() functions parallelize over the outer resampling loop. For all training times, the parallel version is faster than the sequential version. The memory usage is around 3.6 GB.

Tune

The runtime and memory usage of the tune() function is measured for different mlr3tuning versions. A random search is used with a batch size of 1000. The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.sleep",
  sleep_train = model_time / 2,
  sleep_predict = model_time / 2,
  x = to_tune(0, 1))

tune(
  tune = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_benchmark_result = FALSE,
  store_models = FALSE
)

Model Time 1000 ms

Median runtime of tune() with models trained for 1000 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 1000 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1000 1000 1,000 1,000 1.0 519 705
0.18.0 0.7.2 0.14.1 0.11.0 1000 1000 1,000 1,000 1.0 518 722
0.19.0 0.7.2 0.16.1 0.11.1 1000 1000 1,000 1,000 1.0 562 729
0.19.1 0.7.3 0.17.0 0.11.1 1000 1000 1,000 1,000 1.0 511 549
0.19.2 0.7.3 0.17.0 0.11.1 1000 1000 1,000 1,000 1.0 511 549
0.20.0 0.8.0 0.19.0 0.11.1 1000 1000 1,000 1,000 1.0 424 534
1.0.0 1.0.0 0.20.0 1.0.1 1000 1000 1,000 1,000 1.0 434 544
1.0.1 1.1.0 0.20.2 1.0.1 1000 1000 1,000 1,000 1.0 434 470
1.0.2 1.1.0 0.21.0 1.0.1 1000 1000 1,000 1,000 1.0 362 565
1.1.0 1.2.0 0.21.1 1.0.1 1000 1000 1,000 1,000 1.0 362 564
1.2.0 1.3.0 0.21.1 1.0.1 1000 1000 1,000 1,000 1.0 364 564
1.2.1 1.4.0 0.22.0 1.0.1 1000 1000 1,000 1,000 1.0 363 564
1.3.0 1.5.0 0.22.1 1.0.1 1000 1000 1,000 1,000 1.0 364 566

Model Time 100 ms

Median runtime of tune() with models trained for 100 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 100 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 100 100 140 150 1.4 519 705
0.18.0 0.7.2 0.14.1 0.11.0 100 100 140 140 1.4 518 722
0.19.0 0.7.2 0.16.1 0.11.1 100 100 140 140 1.4 562 729
0.19.1 0.7.3 0.17.0 0.11.1 100 100 120 120 1.2 511 549
0.19.2 0.7.3 0.17.0 0.11.1 100 100 120 120 1.2 511 549
0.20.0 0.8.0 0.19.0 0.11.1 100 100 110 120 1.1 424 534
1.0.0 1.0.0 0.20.0 1.0.1 100 100 110 120 1.1 434 544
1.0.1 1.1.0 0.20.2 1.0.1 100 100 120 120 1.2 434 470
1.0.2 1.1.0 0.21.0 1.0.1 100 100 120 120 1.2 362 565
1.1.0 1.2.0 0.21.1 1.0.1 100 100 110 120 1.1 362 564
1.2.0 1.3.0 0.21.1 1.0.1 100 100 110 120 1.1 364 564
1.2.1 1.4.0 0.22.0 1.0.1 100 100 110 120 1.1 363 564
1.3.0 1.5.0 0.22.1 1.0.1 100 100 110 120 1.1 364 566

Model Time 10 ms

Median runtime of tune() with models trained for 10 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 10 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 10 10 100 110 10 519 705
0.18.0 0.7.2 0.14.1 0.11.0 10 10 96 98 9.6 518 722
0.19.0 0.7.2 0.16.1 0.11.1 10 10 63 53 6.3 562 729
0.19.1 0.7.3 0.17.0 0.11.1 10 10 25 25 2.5 511 549
0.19.2 0.7.3 0.17.0 0.11.1 10 10 25 25 2.5 511 549
0.20.0 0.8.0 0.19.0 0.11.1 10 10 25 24 2.5 424 534
1.0.0 1.0.0 0.20.0 1.0.1 10 10 24 24 2.4 434 544
1.0.1 1.1.0 0.20.2 1.0.1 10 10 24 23 2.4 434 470
1.0.2 1.1.0 0.21.0 1.0.1 10 10 24 25 2.4 362 565
1.1.0 1.2.0 0.21.1 1.0.1 10 10 24 26 2.4 362 564
1.2.0 1.3.0 0.21.1 1.0.1 10 10 24 26 2.4 364 564
1.2.1 1.4.0 0.22.0 1.0.1 10 10 24 26 2.4 363 564
1.3.0 1.5.0 0.22.1 1.0.1 10 10 24 26 2.4 364 566

Model Time 1 ms

Median runtime of tune() with models trained for 1 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 1 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1 1 110 99 110 519 705
0.18.0 0.7.2 0.14.1 0.11.0 1 1 100 78 100 518 722
0.19.0 0.7.2 0.16.1 0.11.1 1 1 91 87 91 562 729
0.19.1 0.7.3 0.17.0 0.11.1 1 1 16 17 16 511 549
0.19.2 0.7.3 0.17.0 0.11.1 1 1 14 18 14 511 549
0.20.0 0.8.0 0.19.0 0.11.1 1 1 14 17 14 424 534
1.0.0 1.0.0 0.20.0 1.0.1 1 1 13 16 13 434 544
1.0.1 1.1.0 0.20.2 1.0.1 1 1 13 17 13 434 470
1.0.2 1.1.0 0.21.0 1.0.1 1 1 14 18 14 362 565
1.1.0 1.2.0 0.21.1 1.0.1 1 1 14 17 14 362 564
1.2.0 1.3.0 0.21.1 1.0.1 1 1 14 17 14 364 564
1.2.1 1.4.0 0.22.0 1.0.1 1 1 15 17 15 363 564
1.3.0 1.5.0 0.22.1 1.0.1 1 1 15 17 15 364 566

Memory

Memory usage of tune() depending on the mlr3tuning version. Error bars represent the median absolute deviation of the memory usage. The dashed line indicates the memory usage of an empty R session which is 131 MB.

Tune in Parallel

The runtime and memory usage of the tune() function is measured for different mlr3tuning versions. A random search is used with a batch size of 1000. The tuning is conducted in parallel on 10 cores with future::multisession. The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.sleep",
  sleep_train = model_time / 2,
  sleep_predict = model_time / 2,
  x = to_tune(0, 1))

options("mlr3.exec_chunk_size" = 100)
future::plan("multisession", workers = 10)

tune(
  tune = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_benchmark_result = FALSE,
  store_models = FALSE
)

Model Time 1000 ms

Median runtime of tune() on 10 cores with models trained for 1000 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models divided by 10. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 1000 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red median runtime indicates that the parallelized version took longer the the sequential run. K values with a red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1000 1000 120 1,000 120 1.2 2,785 2,606
0.18.0 0.7.2 0.14.1 0.11.0 1000 1000 120 1,000 120 1.2 2,816 2,632
0.19.0 0.7.2 0.16.1 0.11.1 1000 1000 120 1,000 120 1.2 2,816 2,632
0.19.1 0.7.3 0.17.0 0.11.1 1000 1000 110 1,000 110 1.1 1,403 2,033
0.19.2 0.7.3 0.17.0 0.11.1 1000 1000 110 1,000 110 1.1 1,403 2,079
0.20.0 0.8.0 0.19.0 0.11.1 1000 1000 110 1,000 110 1.1 1,434 1,946
1.0.0 1.0.0 0.20.0 1.0.1 1000 1000 110 1,000 110 1.1 1,454 1,935
1.0.1 1.1.0 0.20.2 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,935
1.0.2 1.1.0 0.21.0 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,946
1.1.0 1.2.0 0.21.1 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,946
1.2.0 1.3.0 0.21.1 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,946
1.2.1 1.4.0 0.22.0 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,946
1.3.0 1.5.0 0.22.1 1.0.1 1000 1000 110 1,000 110 1.1 1,444 1,946

Model Time 100 ms

Median runtime of tune() on 10 cores with models trained for 100 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models divided by 10. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 100 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red median runtime indicates that the parallelized version took longer the the sequential run. K values with a red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 100 100 530 140 310 53 2,785 2,606
0.18.0 0.7.2 0.14.1 0.11.0 100 100 32 140 220 3.2 2,816 2,632
0.19.0 0.7.2 0.16.1 0.11.1 100 100 33 140 120 3.3 2,816 2,632
0.19.1 0.7.3 0.17.0 0.11.1 100 100 24 120 24 2.4 1,403 2,033
0.19.2 0.7.3 0.17.0 0.11.1 100 100 24 120 180 2.4 1,403 2,079
0.20.0 0.8.0 0.19.0 0.11.1 100 100 24 110 23 2.4 1,434 1,946
1.0.0 1.0.0 0.20.0 1.0.1 100 100 24 110 24 2.4 1,454 1,935
1.0.1 1.1.0 0.20.2 1.0.1 100 100 23 120 23 2.3 1,444 1,935
1.0.2 1.1.0 0.21.0 1.0.1 100 100 25 120 24 2.5 1,444 1,946
1.1.0 1.2.0 0.21.1 1.0.1 100 100 24 110 24 2.4 1,444 1,946
1.2.0 1.3.0 0.21.1 1.0.1 100 100 24 110 24 2.4 1,444 1,946
1.2.1 1.4.0 0.22.0 1.0.1 100 100 24 110 24 2.4 1,444 1,946
1.3.0 1.5.0 0.22.1 1.0.1 100 100 24 110 23 2.4 1,444 1,946

Model Time 10 ms

Median runtime of tune() on 10 cores with models trained for 10 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models divided by 10. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 10 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red median runtime indicates that the parallelized version took longer the the sequential run. K values with a red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 10 10 180 100 160 180 2,785 2,606
0.18.0 0.7.2 0.14.1 0.11.0 10 10 22 96 85 22 2,816 2,632
0.19.0 0.7.2 0.16.1 0.11.1 10 10 23 63 100 23 2,816 2,632
0.19.1 0.7.3 0.17.0 0.11.1 10 10 14 25 14 14 1,403 2,033
0.19.2 0.7.3 0.17.0 0.11.1 10 10 15 25 48 15 1,403 2,079
0.20.0 0.8.0 0.19.0 0.11.1 10 10 16 25 15 16 1,434 1,946
1.0.0 1.0.0 0.20.0 1.0.1 10 10 15 24 15 15 1,454 1,935
1.0.1 1.1.0 0.20.2 1.0.1 10 10 15 24 15 15 1,444 1,935
1.0.2 1.1.0 0.21.0 1.0.1 10 10 16 24 16 16 1,444 1,946
1.1.0 1.2.0 0.21.1 1.0.1 10 10 16 24 15 16 1,444 1,946
1.2.0 1.3.0 0.21.1 1.0.1 10 10 15 24 16 15 1,444 1,946
1.2.1 1.4.0 0.22.0 1.0.1 10 10 15 24 15 15 1,444 1,946
1.3.0 1.5.0 0.22.1 1.0.1 10 10 15 24 15 15 1,444 1,946

Model Time 1 ms

Median runtime of tune() on 10 cores with models trained for 1 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models divided by 10. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune() with models trained for 1 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red median runtime indicates that the parallelized version took longer the the sequential run. K values with a red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1 1 140 110 81 1,400 2,785 2,606
0.18.0 0.7.2 0.14.1 0.11.0 1 1 46 100 74 460 2,816 2,632
0.19.0 0.7.2 0.16.1 0.11.1 1 1 29 91 64 290 2,816 2,632
0.19.1 0.7.3 0.17.0 0.11.1 1 1 14 16 14 140 1,403 2,033
0.19.2 0.7.3 0.17.0 0.11.1 1 1 13 14 14 130 1,403 2,079
0.20.0 0.8.0 0.19.0 0.11.1 1 1 15 14 15 150 1,434 1,946
1.0.0 1.0.0 0.20.0 1.0.1 1 1 14 13 15 140 1,454 1,935
1.0.1 1.1.0 0.20.2 1.0.1 1 1 13 13 15 130 1,444 1,935
1.0.2 1.1.0 0.21.0 1.0.1 1 1 15 14 15 150 1,444 1,946
1.1.0 1.2.0 0.21.1 1.0.1 1 1 15 14 15 150 1,444 1,946
1.2.0 1.3.0 0.21.1 1.0.1 1 1 15 14 15 150 1,444 1,946
1.2.1 1.4.0 0.22.0 1.0.1 1 1 18 15 14 180 1,444 1,946
1.3.0 1.5.0 0.22.1 1.0.1 1 1 15 15 14 150 1,444 1,946

Memory

Memory usage of tune() depending on the mlr3tuning version and the number of resampling iterations. Error bars represent the median absolute deviation of the memory usage.

Nested Tuning

The runtime and memory usage of the tune_nested() function is measured for different mlr3tuning versions. The outer resampling has 10 iterations and the inner random search evaluates 1000 configurations in total. The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.sleep",
  sleep_train = model_time / 2,
  sleep_predict = model_time / 2,
  x = to_tune(0, 1))

tune_nested(
  tuner = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  inner_resampling = rsmp("holdout"),
  outer_resampling = rsmp("subsampling", repeats = 10),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_tune_instance = FALSE,
  store_benchmark_result = FALSE,
  store_models = FALSE
)

Model Time 1000 ms

Median runtime of tune_nested() with models trained for 1000 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() with models trained for 1000 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1000 10000 10,000 10,000 1.0 678 779
0.18.0 0.7.2 0.14.1 0.11.0 1000 10000 10,000 10,000 1.0 679 695
0.19.0 0.7.2 0.16.1 0.11.1 1000 10000 10,000 10,000 1.0 709 756
0.19.1 0.7.3 0.17.0 0.11.1 1000 10000 10,000 10,000 1.0 548 648
0.19.2 0.7.3 0.17.0 0.11.1 1000 10000 10,000 10,000 1.0 549 655
0.20.0 0.8.0 0.19.0 0.11.1 1000 10000 10,000 10,000 1.0 629 593
1.0.0 1.0.0 0.20.0 1.0.1 1000 10000 10,000 10,000 1.0 626 549
1.0.1 1.1.0 0.20.2 1.0.1 1000 10000 10,000 10,000 1.0 619 582
1.0.2 1.1.0 0.21.0 1.0.1 1000 10000 10,000 10,000 1.0 666 573
1.1.0 1.2.0 0.21.1 1.0.1 1000 10000 10,000 10,000 1.0 666 573
1.2.0 1.3.0 0.21.1 1.0.1 1000 10000 10,000 10,000 1.0 692 559
1.2.1 1.4.0 0.22.0 1.0.1 1000 10000 10,000 10,000 1.0 681 574
1.3.0 1.5.0 0.22.1 1.0.1 1000 10000 10,000 10,000 1.0 669 572

Model Time 100 ms

Median runtime of tune_nested() with models trained for 100 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() with models trained for 100 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 100 1000 1,800 1,800 1.8 678 779
0.18.0 0.7.2 0.14.1 0.11.0 100 1000 1,900 2,100 1.9 679 695
0.19.0 0.7.2 0.16.1 0.11.1 100 1000 2,000 1,800 2.0 709 756
0.19.1 0.7.3 0.17.0 0.11.1 100 1000 1,200 1,300 1.2 548 648
0.19.2 0.7.3 0.17.0 0.11.1 100 1000 1,200 1,200 1.2 549 655
0.20.0 0.8.0 0.19.0 0.11.1 100 1000 1,200 1,200 1.2 629 593
1.0.0 1.0.0 0.20.0 1.0.1 100 1000 1,200 1,200 1.2 626 549
1.0.1 1.1.0 0.20.2 1.0.1 100 1000 1,200 1,200 1.2 619 582
1.0.2 1.1.0 0.21.0 1.0.1 100 1000 1,200 1,200 1.2 666 573
1.1.0 1.2.0 0.21.1 1.0.1 100 1000 1,200 1,200 1.2 666 573
1.2.0 1.3.0 0.21.1 1.0.1 100 1000 1,200 1,200 1.2 692 559
1.2.1 1.4.0 0.22.0 1.0.1 100 1000 1,200 1,300 1.2 681 574
1.3.0 1.5.0 0.22.1 1.0.1 100 1000 1,200 1,200 1.2 669 572

Model Time 10 ms

Median runtime of tune_nested() with models trained for 10 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() with models trained for 10 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 10 100 1,600 2,200 16 678 779
0.18.0 0.7.2 0.14.1 0.11.0 10 100 1,500 2,200 15 679 695
0.19.0 0.7.2 0.16.1 0.11.1 10 100 2,100 1,200 21 709 756
0.19.1 0.7.3 0.17.0 0.11.1 10 100 330 370 3.3 548 648
0.19.2 0.7.3 0.17.0 0.11.1 10 100 350 370 3.5 549 655
0.20.0 0.8.0 0.19.0 0.11.1 10 100 320 360 3.2 629 593
1.0.0 1.0.0 0.20.0 1.0.1 10 100 310 340 3.1 626 549
1.0.1 1.1.0 0.20.2 1.0.1 10 100 320 330 3.2 619 582
1.0.2 1.1.0 0.21.0 1.0.1 10 100 330 340 3.3 666 573
1.1.0 1.2.0 0.21.1 1.0.1 10 100 320 340 3.2 666 573
1.2.0 1.3.0 0.21.1 1.0.1 10 100 320 360 3.2 692 559
1.2.1 1.4.0 0.22.0 1.0.1 10 100 340 350 3.4 681 574
1.3.0 1.5.0 0.22.1 1.0.1 10 100 320 350 3.2 669 572

Model Time 1 ms

Median runtime of tune_nested() with models trained for 1 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() with models trained for 1 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1 10 980 1,700 98 678 779
0.18.0 0.7.2 0.14.1 0.11.0 1 10 1,800 2,300 180 679 695
0.19.0 0.7.2 0.16.1 0.11.1 1 10 1,200 1,700 120 709 756
0.19.1 0.7.3 0.17.0 0.11.1 1 10 240 280 24 548 648
0.19.2 0.7.3 0.17.0 0.11.1 1 10 230 270 23 549 655
0.20.0 0.8.0 0.19.0 0.11.1 1 10 240 260 24 629 593
1.0.0 1.0.0 0.20.0 1.0.1 1 10 230 250 23 626 549
1.0.1 1.1.0 0.20.2 1.0.1 1 10 230 260 23 619 582
1.0.2 1.1.0 0.21.0 1.0.1 1 10 250 250 25 666 573
1.1.0 1.2.0 0.21.1 1.0.1 1 10 240 250 24 666 573
1.2.0 1.3.0 0.21.1 1.0.1 1 10 250 250 25 692 559
1.2.1 1.4.0 0.22.0 1.0.1 1 10 240 280 24 681 574
1.3.0 1.5.0 0.22.1 1.0.1 1 10 250 260 25 669 572

Memory

Memory usage of tune_nested() depending on the mlr3tuning version and the number of resampling iterations. Error bars represent the median absolute deviation of the memory usage. The dashed line indicates the memory usage of an empty R session which is 131 MB.

Nested Tuning in Parallel

The runtime and memory usage of the tune_nested() function is measured for different mlr3tuning versions. The outer resampling has 10 iterations and the inner random search evaluates 1000 configurations in total. The outer resampling is run in parallel on 10 cores with future::multisession. The models are trained for different amounts of time (1 ms, 10 ms, 100 ms, and 1000 ms) on the spam dataset with 1000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.sleep",
  sleep_train = model_time / 2,
  sleep_predict = model_time / 2,
  x = to_tune(0, 1))

future::plan("multisession", workers = 10)

tune_nested(
  tuner = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  inner_resampling = rsmp("holdout"),
  outer_resampling = rsmp("subsampling", repeats = 10),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_tune_instance = FALSE,
  store_benchmark_result = FALSE,
  store_models = FALSE
)

Model Time 1000 ms

Median runtime of tune_nested() on 10 cores with models trained for 1000 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() on 10 cores with models trained for 1000 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1000 10000 1,000 10,000 1,000 1.0 5,028 5,335
0.18.0 0.7.2 0.14.1 0.11.0 1000 10000 1,000 10,000 1,000 1.0 4,895 5,325
0.19.0 0.7.2 0.16.1 0.11.1 1000 10000 1,000 10,000 1,000 1.0 4,864 5,315
0.19.1 0.7.3 0.17.0 0.11.1 1000 10000 1,000 10,000 1,000 1.0 3,789 4,178
0.19.2 0.7.3 0.17.0 0.11.1 1000 10000 1,000 10,000 1,000 1.0 3,789 4,275
0.20.0 0.8.0 0.19.0 0.11.1 1000 10000 1,000 10,000 1,000 1.0 3,543 3,860
1.0.0 1.0.0 0.20.0 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,512 3,820
1.0.1 1.1.0 0.20.2 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,564 3,850
1.0.2 1.1.0 0.21.0 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,604 3,901
1.1.0 1.2.0 0.21.1 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,604 3,891
1.2.0 1.3.0 0.21.1 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,625 3,922
1.2.1 1.4.0 0.22.0 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,604 3,912
1.3.0 1.5.0 0.22.1 1.0.1 1000 10000 1,000 10,000 1,000 1.0 3,615 3,922

Model Time 100 ms

Median runtime of tune_nested() on 10 cores with models trained for 100 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() on 10 cores with models trained for 100 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 100 1000 180 1,800 180 1.8 5,028 5,335
0.18.0 0.7.2 0.14.1 0.11.0 100 1000 190 1,900 210 1.9 4,895 5,325
0.19.0 0.7.2 0.16.1 0.11.1 100 1000 200 2,000 180 2.0 4,864 5,315
0.19.1 0.7.3 0.17.0 0.11.1 100 1000 120 1,200 130 1.2 3,789 4,178
0.19.2 0.7.3 0.17.0 0.11.1 100 1000 120 1,200 120 1.2 3,789 4,275
0.20.0 0.8.0 0.19.0 0.11.1 100 1000 120 1,200 120 1.2 3,543 3,860
1.0.0 1.0.0 0.20.0 1.0.1 100 1000 120 1,200 120 1.2 3,512 3,820
1.0.1 1.1.0 0.20.2 1.0.1 100 1000 120 1,200 120 1.2 3,564 3,850
1.0.2 1.1.0 0.21.0 1.0.1 100 1000 120 1,200 120 1.2 3,604 3,901
1.1.0 1.2.0 0.21.1 1.0.1 100 1000 120 1,200 120 1.2 3,604 3,891
1.2.0 1.3.0 0.21.1 1.0.1 100 1000 120 1,200 120 1.2 3,625 3,922
1.2.1 1.4.0 0.22.0 1.0.1 100 1000 120 1,200 130 1.2 3,604 3,912
1.3.0 1.5.0 0.22.1 1.0.1 100 1000 120 1,200 120 1.2 3,615 3,922

Model Time 10 ms

Median runtime of tune_nested() on 10 cores with models trained for 10 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() on 10 cores with models trained for 10 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 10 100 160 1,600 220 16 5,028 5,335
0.18.0 0.7.2 0.14.1 0.11.0 10 100 150 1,500 220 15 4,895 5,325
0.19.0 0.7.2 0.16.1 0.11.1 10 100 210 2,100 120 21 4,864 5,315
0.19.1 0.7.3 0.17.0 0.11.1 10 100 33 330 37 3.3 3,789 4,178
0.19.2 0.7.3 0.17.0 0.11.1 10 100 35 350 37 3.5 3,789 4,275
0.20.0 0.8.0 0.19.0 0.11.1 10 100 32 320 36 3.2 3,543 3,860
1.0.0 1.0.0 0.20.0 1.0.1 10 100 31 310 34 3.1 3,512 3,820
1.0.1 1.1.0 0.20.2 1.0.1 10 100 32 320 33 3.2 3,564 3,850
1.0.2 1.1.0 0.21.0 1.0.1 10 100 33 330 34 3.3 3,604 3,901
1.1.0 1.2.0 0.21.1 1.0.1 10 100 32 320 34 3.2 3,604 3,891
1.2.0 1.3.0 0.21.1 1.0.1 10 100 32 320 36 3.2 3,625 3,922
1.2.1 1.4.0 0.22.0 1.0.1 10 100 34 340 35 3.4 3,604 3,912
1.3.0 1.5.0 0.22.1 1.0.1 10 100 32 320 35 3.2 3,615 3,922

Model Time 1 ms

Median runtime of tune_nested() on 10 cores with models trained for 1 ms depending on the mlr3tuning version. The dashed line indicates the total training time of the models. Error bars represent the median absolute deviation of the runtime.
Runtime and memory usage of tune_nested() on 10 cores with models trained for 1 ms depending on the mlr3tuning version. The K factor shows how much longer the runtime is than the model training. A red background indicates that the runtime is 3 times larger than the total training time of the models. The table includes runtime and memory usage for tasks of size 1000 and 10,000.
mlr3tuning Version bbotk Version mlr3 Version paradox Version Model Time [ms] Total Model Time [s] Median Runtime [s] Median Runtime Sequential [s] Median Runtime 10,000 [s] K Median Memory [MB] Median Memory 10,000 [s]
0.17.2 0.7.2 0.14.1 0.11.0 1 10 98 980 170 98 5,028 5,335
0.18.0 0.7.2 0.14.1 0.11.0 1 10 180 1,800 230 180 4,895 5,325
0.19.0 0.7.2 0.16.1 0.11.1 1 10 120 1,200 170 120 4,864 5,315
0.19.1 0.7.3 0.17.0 0.11.1 1 10 24 240 28 24 3,789 4,178
0.19.2 0.7.3 0.17.0 0.11.1 1 10 23 230 27 23 3,789 4,275
0.20.0 0.8.0 0.19.0 0.11.1 1 10 24 240 26 24 3,543 3,860
1.0.0 1.0.0 0.20.0 1.0.1 1 10 23 230 25 23 3,512 3,820
1.0.1 1.1.0 0.20.2 1.0.1 1 10 23 230 26 23 3,564 3,850
1.0.2 1.1.0 0.21.0 1.0.1 1 10 25 250 25 25 3,604 3,901
1.1.0 1.2.0 0.21.1 1.0.1 1 10 24 240 25 24 3,604 3,891
1.2.0 1.3.0 0.21.1 1.0.1 1 10 25 250 25 25 3,625 3,922
1.2.1 1.4.0 0.22.0 1.0.1 1 10 24 240 28 24 3,604 3,912
1.3.0 1.5.0 0.22.1 1.0.1 1 10 25 250 26 25 3,615 3,922

Memory

Memory usage of tune_nested() depending on the mlr3tuning version. Error bars represent the median absolute deviation of the memory usage.