Time constraints in the mlr3 ecosystem

Set time limits for learners, tuning and nested resampling.

Author
Published

December 21, 2023

Scope

Setting time limits is an important consideration when tuning unreliable or unstable learning algorithms and when working on shared computing resources. The mlr3 ecosystem provides several mechanisms for setting time constraints for individual learners, tuning processes, and nested resampling.

Learner

This section demonstrates how to impose time constraints using a support vector machine (SVM) as an illustrative example.

library(mlr3verse)

learner = lrn("classif.svm")

Applying timeouts to the $train() and $predict() functions is essential for managing learners that may operate indefinitely. These time constraints are set independently for both the training and prediction stages. Generally, training a learner consumes more time than prediction. Certain learners, like k-nearest neighbors, lack a distinct training phase and require a timeout only during prediction. For the SVM’s training, we set a 10-second limit.

learner$timeout = c(train = 10, predict = Inf)

To effectively terminate the process if necessary, it’s important to run the training and prediction within a separate R process. The callr package is recommended for this encapsulation, as it tends to be more reliable than the evaluate package, especially for terminating externally compiled code.

learner$encapsulate = c(train = "callr", predict = "callr")

Note that using callr increases the runtime due to the overhead of starting an R process. Additionally, it’s advisable to specify a fallback learner, such as "classif.featureless", to provide baseline predictions in case the primary learner is terminated.

learner$fallback = lrn("classif.featureless")

These time constraints are now integrated into the training, resampling, and benchmarking processes. For more information on encapsulation and fallback learners, see the mlr3book. The next section will focus on setting time limits for the entire tuning process.

Tuning

When working with high-performance computing clusters, jobs are often bound by strict time constraints. Exceeding these limits results in the job being terminated and the loss of any results generated. Therefore, it’s important to ensure that the tuning process is designed to adhere to these time constraints.

The trm("runtime") controls the duration of the tuning process. We must take into account that the terminator can only check if the time limit is reached between batches. We must therefore set the time lower than the runtime of the job. How much lower depends on the runtime or time limit of the individual learners. The last batch should be able to finish before the time limit of the cluster is reached.

terminator = trm("run_time", secs = 60)

instance = ti(
  task = tsk("sonar"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measures = msr("classif.ce"),
  terminator = terminator
)

With these settings, our tuning operation is configured to run for 60 seconds, while individual learners are set to terminate after 10 seconds. This approach ensures the tuning process is efficient and adheres to the constraints imposed by the high-performance computing cluster.

Nested Resampling

When using nested resampling, time constraints become more complex as they are applied across various levels. As before, the time limit for an individual learner during the tuning is set with $timeout. The time limit for the tuning processes in the auto tuners is controlled with the trm("runtime"). It’s important to note that once the auto tuner enters the final phase of fitting the model and making predictions on the outer test set, the time limit governed by the terminator no longer applies. Additionally, the time limit previously set on the learner is temporarily deactivated, allowing the auto tuner to complete its task uninterrupted. However, a separate time limit can be assigned to each auto tuner using $timeout. This limit encompasses not only the tuning phase but also the time required for fitting the final model and predictions on the outer test set.

The best way to show this is with an example. We set the time limit for an individual learner to 10 seconds.

learner$timeout = c(train = 10, predict = Inf)
learner$encapsulate = c(train = "callr", predict = "callr")
learner$fallback = lrn("classif.featureless")

Next, we give each auto tuner 60 seconds to finish the tuning process.

terminator = trm("run_time", secs = 60)

Furthermore, we impose a 120-second limit for resampling each auto tuner. This effectively divides the time allocation, with around 60 seconds for tuning and another 60 seconds for final model fitting and predictions on the outer test set.

at = auto_tuner(
  tuner = tnr("random_search"),
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  terminator = trm("run_time", secs = 60)
)

at$timeout = c(train = 100, predict = 20)
at$encapsulate = c(train = "callr", predict = "callr")
at$fallback = lrn("classif.featureless")

In total, the entire nested resampling process is designed to be completed within 10 minutes (120 seconds multiplied by 5 folds).

rr = resample(task, at, rsmp("cv", folds = 5))

Conclusion

We delved into the setting of time constraints across different levels in the mlr3 ecosystem. From individual learners to the complexities of nested resampling, we’ve seen how effectively managing time limits can significantly enhance the efficiency and reliability of machine learning workflows. By utilizing the trm("runtime") for tuning processes and setting $timeout for individual learners and auto tuners, we can ensure that our machine learning tasks are not only effective but also adhere to the practical time constraints of shared computational resources. For more information, see also the error handling section in the mlr3book.