Advanced Resampling with Custom Measure Solution

Use stratified resampling to evaluate the german credit set and blocking for BreastCancer set. Define custom measures in mlr3 and use them to evaluate a model on the mtcars task.

Authors

Goal

After this exercise, you should be able to control the resampling process when using mlr3 in order to account for data specificities, such as class imbalances in classification settings or grouping phenomena. Further, you will have learned how to construct and utilize custom measures for performance evaluation within mlr3.

Prerequisites

We load the most important packages and use a fixed seed for reproducibility.

library(mlr3verse)
library(mlbench)
library(data.table)
set.seed(7832)

1 Stratified Resampling

In classification tasks, the ratio of the target class distribution should be similar in each train/test split, which is achieved by stratification. This is particularly useful in the case of imbalanced classes and small data sets.

Stratification can also be performed with respect to explanatory categorical variables to ensure that all subgroups are represented in all training and test sets.

In mlr3, each task has a slot $col_roles. This slot shows general roles certain features will have throughout different stages of the machine learning process. At least, the $col_roles slot shows which variables will be used as feature and as target. However, the $col_roles slot can be more diverse and some variables might even serve multiple roles. For example, task$col_roles$stratum specify the variable used for stratification. In this exercise, we will illustrate this using the german_credit data:

task_gc = tsk("german_credit")
task_gc$col_roles
$feature
 [1] "age"                     "amount"                  "credit_history"          "duration"               
 [5] "employment_duration"     "foreign_worker"          "housing"                 "installment_rate"       
 [9] "job"                     "number_credits"          "other_debtors"           "other_installment_plans"
[13] "people_liable"           "personal_status_sex"     "present_residence"       "property"               
[17] "purpose"                 "savings"                 "status"                  "telephone"              

$target
[1] "credit_risk"

$name
character(0)

$order
character(0)

$stratum
character(0)

$group
character(0)

$weight
character(0)

$offset
character(0)

$always_included
character(0)

1.1 Set stratification variable

Modify the task_gc object such that the target variable credit_risk is used to for stratification.

Hint 1
task_gc$col_roles$... = "credit_risk"

1.2 Create resampling procedure

Next, specify a 3-fold cross validation and instantiate the resampling on the task.

1.3 Sanity check

As a sanity check, the target class distribution should be similar within each CV fold. Compute and check the target class distribution in form of a ratio within each fold.

Hint 1 First, merge the data with the corresponding cv fold. Second, aggregate for each fold.
Hint 2
dt <- merge(cv3$..., transform(..., row_id = seq_len(...)), by = ...)
aggregate(..., data = ..., FUN = function(x) ...)

2 Block Resampling

An additional concern when specifying resampling is respecting the natural grouping of the data. Blocking refers to the situation where subsets of observations belong together and must not be separated during resampling. Hence, for one train/test set pair the entire block is either in the training set or in the test set.

In the following example, wel will consider the BreastCancer data set from the mlbench package:

data(BreastCancer, package = "mlbench")
task_bc = as_task_classif(BreastCancer, target = "Class", positive = "malignant")

In this data set, several observations have the same Id (sample code number), which implies these are samples taken from the same patient at different times.

2.1 Count groups

Let’s count how many observation actually have the same Id more than once.

The model trained on this data set will be used to predict cancer status of new patients. Hence, we have to make sure that each Id occurs exactly in one fold, so that all observations with the same Id should be either used for training or for evaluating the model. This way, we get less biased performance estimates via k-fold cross validation. This can be achieved by block cross validation.

2.2 Set up block resampling

Similarly to stratified resampling, block resampling uses task$col_roles$group to specify the name of a grouping variable included in the feature set. Now, set the column Id as grouping variable in the task object.

2.3 Instantiate resampling

Next, set up a 5-fold CV and instantiate it on the task.

2.4 Sanity check

If the specified blocking groups are respected, each Id appears only in exactly one fold. To inspect if blocking was successful when generating the folds, count how often each Id appears in a specific fold and print the Ids that appear in more than one fold.

As expected, the table is empty as there are no Id’s present in more than one fold.

3 Custom Performance Measures

Many domain applications require custom measures for performance evaluations not supported in mlr3. You can inspect all available measures by calling as.data.table(mlr_measures). Luckily, you can design your own measures for evaluating model performance. To do so, we simply create a new R6 class that inherits either from MeasureRegr (for a regression measure) or MeasureClassif (for a classification measure). Let’s see how this works in practice. Let us consider a regression measure that scores a prediction as 1 if the difference between the true and predicted values is less than one standard deviation of the truth, or scores the prediction as 0 otherwise. In maths this would be defined as \(f(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(|y_i - \hat{y}_i| < \sigma_y)\), where \(\sigma_y\) is the standard deviation of the truth and \(\mathbb{I}\) is the indicator function. In this case, we need the following code to construct a corresponding measure class:

MeasureRegrThresholdAcc = R6::R6Class("MeasureRegrThresholdAcc",
  inherit = mlr3::MeasureRegr, # regression measure
  public = list(
    initialize = function() { # initialize class
      super$initialize(
        id = "thresh_acc", # unique ID
        packages = character(), # no package dependencies
        properties = character(), # no special properties
        predict_type = "response", # measures response prediction
        range = c(0, 1), # results in values between (0, 1)
        minimize = FALSE # larger values are better
      )
    }
  ),

  private = list(
    # define score as private method
    .score = function(prediction, ...) {
      # define loss
      threshold_acc = function(truth, response) {
        mean(ifelse(abs(truth - response) < sd(truth), 1, 0))
      }
      # call loss function
      threshold_acc(prediction$truth, prediction$response)
    }
  )
)
  1. In the first two lines we name the class, here MeasureRegrThresholdAcc, and then state this is a regression measure that inherits from MeasureRegr.
  2. We initialize the class by stating its unique ID is "thresh_acc", that it does not require any external packages (packages = character()) and that it has no special properties (properties = character()).
  3. We then pass specific details of the loss function which are: it measures the quality of a "response" type prediction, its values range between (0, 1), and that the loss is optimized as its maximum (minimize = FALSE).
  4. Finally, we define the score itself as a private method called .score where we pass the predictions to the function we defined just above. The private method is a function assigned to the R6 class MeasureRegrThresholdAcc, such that one can (internally) call object$.score(prediction,...) for an object of class MeasureRegrThresholdAcc. The method is “private” as it is not intended to be visible for the end user.

Once you have defined your custom measure, you can add it to the mlr3measures dictionary like this:

mlr3::mlr_measures$add("regr.thresh_acc", MeasureRegrThresholdAcc)

3.1 MSE-MAE

Define you own risk measure for regression, the maximum of MSE and MAE: \(f(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^n \max((y_i - \hat{y}_i)^2,|y_i - \hat{y}_i|)\), using the code skeleton supplied above.

Hint 1:

You need to change the code chunk containing the MeasureRegrThresholdAcc class definition in at least 7 lines.

3.2 Evaluate a custom measure

Add your custom measure to the mlr3measures dictionary and use it to evaluate the following model prediction:

tsk_mtcars = tsk("mtcars")
split = partition(tsk_mtcars)
lrn_ranger = lrn("regr.ranger")$train(tsk_mtcars, split$train)
prediction = lrn_ranger$predict(tsk_mtcars, split$test)

Summary

  • Stratified resampling helps with balancing classes and features within CV folds, to ensure each fold represents the data well enough.
  • Block resampling reduces bias in generalization error estimates by ensuring that observations from the same group end up in the same fold.
  • Custom domain applications require custom performance measures. In mlr3, you can define custom measures by creating a new R6 class.