library(mlr3verse)
library(mlbench)
library(data.table)
set.seed(7832)
Goal
After this exercise, you should be able to control the resampling process when using mlr3
in order to account for data specificities, such as class imbalances in classification settings or grouping phenomena. Further, you will have learned how to construct and utilize custom measures for performance evaluation within mlr3
.
Prerequisites
We load the most important packages and use a fixed seed for reproducibility.
1 Stratified Resampling
In classification tasks, the ratio of the target class distribution should be similar in each train/test split, which is achieved by stratification. This is particularly useful in the case of imbalanced classes and small data sets.
Stratification can also be performed with respect to explanatory categorical variables to ensure that all subgroups are represented in all training and test sets.
In mlr3
, each task
has a slot $col_roles
. This slot shows general roles certain features will have throughout different stages of the machine learning process. At least, the $col_roles
slot shows which variables will be used as feature
and as target
. However, the $col_roles
slot can be more diverse and some variables might even serve multiple roles. For example, task$col_roles$stratum
specify the variable used for stratification. In this exercise, we will illustrate this using the german_credit
data:
= tsk("german_credit")
task_gc $col_roles task_gc
$feature
[1] "age" "amount" "credit_history" "duration"
[5] "employment_duration" "foreign_worker" "housing" "installment_rate"
[9] "job" "number_credits" "other_debtors" "other_installment_plans"
[13] "people_liable" "personal_status_sex" "present_residence" "property"
[17] "purpose" "savings" "status" "telephone"
$target
[1] "credit_risk"
$name
character(0)
$order
character(0)
$stratum
character(0)
$group
character(0)
$weight
character(0)
$offset
character(0)
$always_included
character(0)
1.1 Set stratification variable
Modify the task_gc
object such that the target variable credit_risk
is used to for stratification.
Hint 1
$col_roles$... = "credit_risk" task_gc
1.2 Create resampling procedure
Next, specify a 3-fold cross validation and instantiate the resampling on the task.
1.3 Sanity check
As a sanity check, the target class distribution should be similar within each CV fold. Compute and check the target class distribution in form of a ratio within each fold.
Hint 1
First, merge the data with the corresponding cv fold. Second, aggregate for each fold.Hint 2
<- merge(cv3$..., transform(..., row_id = seq_len(...)), by = ...)
dt aggregate(..., data = ..., FUN = function(x) ...)
2 Block Resampling
An additional concern when specifying resampling is respecting the natural grouping of the data. Blocking refers to the situation where subsets of observations belong together and must not be separated during resampling. Hence, for one train/test set pair the entire block is either in the training set or in the test set.
In the following example, wel will consider the BreastCancer
data set from the mlbench
package:
data(BreastCancer, package = "mlbench")
= as_task_classif(BreastCancer, target = "Class", positive = "malignant") task_bc
In this data set, several observations have the same Id
(sample code number), which implies these are samples taken from the same patient at different times.
2.1 Count groups
Let’s count how many observation actually have the same Id
more than once.
The model trained on this data set will be used to predict cancer status of new patients. Hence, we have to make sure that each Id occurs exactly in one fold, so that all observations with the same Id should be either used for training or for evaluating the model. This way, we get less biased performance estimates via k-fold cross validation. This can be achieved by block cross validation.
2.2 Set up block resampling
Similarly to stratified resampling, block resampling uses task$col_roles$group
to specify the name of a grouping variable included in the feature set. Now, set the column Id
as grouping variable in the task
object.
2.3 Instantiate resampling
Next, set up a 5-fold CV and instantiate it on the task.
2.4 Sanity check
If the specified blocking groups are respected, each Id
appears only in exactly one fold. To inspect if blocking was successful when generating the folds, count how often each Id
appears in a specific fold and print the Id
s that appear in more than one fold.
As expected, the table is empty as there are no Id’s present in more than one fold.
3 Custom Performance Measures
Many domain applications require custom measures for performance evaluations not supported in mlr3
. You can inspect all available measures by calling as.data.table(mlr_measures)
. Luckily, you can design your own measures for evaluating model performance. To do so, we simply create a new R6
class that inherits either from MeasureRegr
(for a regression measure) or MeasureClassif
(for a classification measure). Let’s see how this works in practice. Let us consider a regression measure that scores a prediction as 1 if the difference between the true and predicted values is less than one standard deviation of the truth, or scores the prediction as 0 otherwise. In maths this would be defined as \(f(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(|y_i - \hat{y}_i| < \sigma_y)\), where \(\sigma_y\) is the standard deviation of the truth and \(\mathbb{I}\) is the indicator function. In this case, we need the following code to construct a corresponding measure class:
= R6::R6Class("MeasureRegrThresholdAcc",
MeasureRegrThresholdAcc inherit = mlr3::MeasureRegr, # regression measure
public = list(
initialize = function() { # initialize class
$initialize(
superid = "thresh_acc", # unique ID
packages = character(), # no package dependencies
properties = character(), # no special properties
predict_type = "response", # measures response prediction
range = c(0, 1), # results in values between (0, 1)
minimize = FALSE # larger values are better
)
}
),
private = list(
# define score as private method
.score = function(prediction, ...) {
# define loss
= function(truth, response) {
threshold_acc mean(ifelse(abs(truth - response) < sd(truth), 1, 0))
}# call loss function
threshold_acc(prediction$truth, prediction$response)
}
) )
- In the first two lines we name the class, here
MeasureRegrThresholdAcc
, and then state this is a regression measure that inherits fromMeasureRegr
. - We initialize the class by stating its unique ID is
"thresh_acc"
, that it does not require any external packages (packages = character()
) and that it has no special properties (properties = character()
). - We then pass specific details of the loss function which are: it measures the quality of a
"response"
type prediction, its values range between(0, 1)
, and that the loss is optimized as its maximum (minimize = FALSE
). - Finally, we define the score itself as a private method called
.score
where we pass the predictions to the function we defined just above. The private method is a function assigned to the R6 classMeasureRegrThresholdAcc
, such that one can (internally) callobject$.score(prediction,...)
for an object of classMeasureRegrThresholdAcc
. The method is “private” as it is not intended to be visible for the end user.
Once you have defined your custom measure, you can add it to the mlr3measures
dictionary like this:
::mlr_measures$add("regr.thresh_acc", MeasureRegrThresholdAcc) mlr3
3.1 MSE-MAE
Define you own risk measure for regression, the maximum of MSE and MAE: \(f(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^n \max((y_i - \hat{y}_i)^2,|y_i - \hat{y}_i|)\), using the code skeleton supplied above.
Hint 1:
You need to change the code chunk containing the MeasureRegrThresholdAcc
class definition in at least 7 lines.
3.2 Evaluate a custom measure
Add your custom measure to the mlr3measures
dictionary and use it to evaluate the following model prediction:
= tsk("mtcars")
tsk_mtcars = partition(tsk_mtcars)
split = lrn("regr.ranger")$train(tsk_mtcars, split$train)
lrn_ranger = lrn_ranger$predict(tsk_mtcars, split$test) prediction
Summary
- Stratified resampling helps with balancing classes and features within CV folds, to ensure each fold represents the data well enough.
- Block resampling reduces bias in generalization error estimates by ensuring that observations from the same group end up in the same fold.
- Custom domain applications require custom performance measures. In
mlr3
, you can define custom measures by creating a newR6
class.