mlr3tuning - Runtime and Memory Benchmarks

Scope

This report analyzes the runtime and memory usage of mlr3tuning across versions. It evaluates tune() and tune_nested() in sequential and parallel modes. Given the size of the mlr3 ecosystem, performance bottlenecks can arise at multiple stages. This report helps users judge whether observed runtimes and memory footprints are within expected ranges. Substantial anomalies should be reported via a GitHub issue. Benchmarks are executed on a high‑performance cluster optimized for multi‑core throughput rather than single‑core speed. Consequently, runtimes may be faster on a modern local machine.

Summary of Latest mlr3tuning Version

The benchmarks are comprehensive, so we summarize results for the latest mlr3tuning version. We measure runtime and memory for random search with 1,000 resampling iterations on the spam dataset with 1,000 and 10,000 instances. Nested resampling uses 10 outer iterations and the same random search in the inner loop. Overhead introduced by tune() and tune_nested() must be interpreted relative to model training time. For 1 s training time, overhead is minimal. For 100 ms training time, overhead is approximately 20%. For 10 ms training time, overhead approximately doubles to triples total runtime. For 1 ms training time, total runtime is about 15 to 25 times the bare model training time. Memory usage for tune() and tune_nested() ranges between 370 MB and 670 MB. An empty R session consumes 131 MB. mlr3tuning parallelizes over resampling iterations using the future package. Parallel execution adds overhead from worker initialization, so we compare parallel and sequential runtimes. For all training times, parallel tune() reduces total runtime. Memory increases with core count because each worker is a separate R session. Using 10 cores requires around 1.5 GB. tune_nested() parallelizes over the outer resampling loop. Across all training times, the parallel version is faster than the sequential version. Total memory usage is approximately 3.6 GB.

Tune

The runtime and memory usage of tune() are measured across mlr3tuning versions. A random search is used with a batch size of 1,000. Models are trained on the spam dataset with 1,000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.rpart",
  cp = to_tune(0, 1))

tune(
  tune = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_benchmark_result = FALSE,
  store_models = FALSE
)
Runtime and memory usage of tune() by mlr3tuning version and task size. The k factors indicate how many times longer total runtime is than the model training time. The subscripts denote reference training times in milliseconds; for example, k100 corresponds to 100 ms. A green background highlights cases where the total runtime is less than three times the model training time. The pk factors report the speedup of parallel relative to sequential execution. The pk factor is omitted when parallel execution is slower than sequential execution.
mlr3tuning Version Task Size Overhead, s k1000 k100 k10 k1 Memory, mb pk1 pk10 pk100 pk1000
1000 Observations
1.4.0 1000 19 1.0 1.2 2.9 20 381 2.2 2.9 6.2 9.3
1.3.0 1000 18 1.0 1.2 2.8 19 379 2.1 2.8 6.1 9.3
1.2.1 1000 19 1.0 1.2 2.9 20 378 2.1 2.8 6.1 9.3
10000 Observations
1.4.0 10000 22 1.0 1.2 3.2 23 414 2.3 2.9 6.2 9.3
1.3.0 10000 21 1.0 1.2 3.1 22 417 2.2 2.9 6.2 9.3
1.2.1 10000 21 1.0 1.2 3.1 22 416 2.3 2.9 6.2 9.3

Nested Tuning

The runtime and memory usage of tune_nested() are measured across mlr3tuning versions. The outer resampling performs 10 iterations, and the inner random search evaluates 1,000 feature subsets. Models are trained on the spam dataset with 1,000 and 10,000 instances.

task = tsk("spam")

learner = lrn("classif.rpart",
  cp = to_tune(0, 1))

tune_nested(
  tuner = tnr("random_search", batch_size = 1000),
  task = task,
  learner = learner,
  inner_resampling = rsmp("holdout"),
  outer_resampling = rsmp("subsampling", repeats = 10),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 1000),
  store_tune_instance = FALSE,
  store_benchmark_result = FALSE,
  store_models = FALSE
)
Runtime and memory usage of tune_nested() by mlr3tuning version and task size. The k factors indicate how many times longer total runtime is than the model training time. The subscripts denote reference training times in milliseconds; for example, k100 corresponds to 100 ms. A green background highlights cases where the total runtime is less than three times the model training time. The pk factors report the speedup of parallel relative to sequential execution. The pk factor is omitted when parallel execution is slower than sequential execution.
mlr3tuning Version Task Size Overhead, s k1000 k100 k10 k1 Memory, mb pk1 pk10 pk100 pk1000
1000 Observations
1.4.0 1000 22 1.0 1.2 3.2 23 344 7.8 8.3 9.5 9.9
1.3.0 1000 21 1.0 1.2 3.1 22 343 8.9 9.2 9.8 10
1.2.1 1000 21 1.0 1.2 3.1 22 340 1.2 1.6 4.3 8.7
10000 Observations
1.4.0 10000 23 1.0 1.2 3.3 24 342 9.1 9.3 9.8 10
1.3.0 10000 23 1.0 1.2 3.3 24 377 7.7 8.3 9.5 9.9
1.2.1 10000 22 1.0 1.2 3.2 23 372 8.6 9.0 9.7 10