Calibration with mlr3

Learn the basics of tidymodels for supervised learning, assess if a model is well-calibrated, and calibrate it with mlr3.

Authors

Goal

Our goal for this exercise sheet is to learn the basics of model calibration for supervised classification with mlr3calibration. In a calibrated model, the predicted probability for an input feature vector can be interpreted as the true likelihood of the outcome belonging to the positive class, meaning that among all instances assigned a probability of \(p\), approximately \(p\%\) will belong to the positive class.

Required packages

We will use mlr3 for machine learning, and mlr3calibration specifically for calibration:

if (!require("mlr3calibration")) {
  remotes::install_github("AdriGl117/mlr3calibration")
}
library(mlr3calibration)
library(mlr3verse)

set.seed(12345)

Data: predicting cell segmentation quality

The modeldata package contains a data set called cells. Initially distributed by Hill and Haney (2007), they showed how to create models that predict the quality of the image analysis of cells. The outcome has two levels: "PS" (for poorly segmented images) or "WS" (well-segmented). There are 56 image features that can be used to build a classifier.

Let’s load the data and remove an unwanted column:

library(modeldata)
data(cells, package = "modeldata")
cells$case <- NULL

1 Calibrate a model with Platt scaling

We will apply Platt scaling to calibrate a model trained on the cells data. Platt scaling is a post-processing calibration method that fits a logistic regression model to the outputs of an uncalibrated classifier, transforming raw scores into calibrated probabilities.

1.1 Creating a train-test split and tasks

First, define a task object for the cells data set. Then, create a simple train-test split on the task to reserve test data for performance evaluation later on. As result, there should be a task_train and a task_test.

Hint 1:

You can use partition() on a given task object to create simple train-test split.

1.2 Assess model calibration

Train an XBOOST model on the training data. To do so, initialize an XGBOOST learner with predict_type = "prob". Then, set learner$id <- "Uncalibrated Learner" for later reference. Train the learner on the correct task. Then, assess if the model is calibrated with calibrationplot(). The calibration plot shows the relationship between the predicted probabilities and the true outcomes. The plot is divided into bins, and within each bin, the mean predicted probability and the mean observed outcome are calculated. The calibration plot can be smoothed by setting smooth = TRUE.

Hint 1:

calibrationplot() requires a list of learners even if the list contains only one argument.

1.3 Calibration strategy

In mlr3calibration, to calibrate a learner you need a base learner (which will fit a model that is calibrated afterwards), a resampling strategy, and a calibration method (Platt, Beta or Isotonic). Initialize 1) another XGBOOST base learner, 2) a 5-fold CV resampling object, and 3) a calibration strategy. The calibration strategy in mlr3calibration is implemented as PipeOpCalibration object. It requires the base learner (learner), the calibration method (method), and the resampling method (rsmp) as arguments to be initialized. Practically, we want to use the calibration strategy as learner, so we have to express the pipeline operator within as_learner(). After that, set learner_cal$id <- "Platt Calibrated Learner" for later reference.

Hint 1:
learner_uncal = ...
rsmp = ...
learner_cal = as_learner(PipeOpCalibration$new(...))
learner_cal$id <- "Platt Calibrated Learner"
Hint 2:

Check the documentation of PipeOpCalibration with ??PipeOpCalibration.

1.4 Calibrate learner

The calibrated learner can be trained on a task as any other learner. Train the learner on task_train. Afterwards, plot the calibration plot again, comparing the uncalibrated XGBOOST model with the Platt-scaled XGBOOST model.

2 Calibration measures

mlr3calibration features measures for performance evaluation specifically to assess model calibration: the Expected Calibration Error (ECE) and the Integrated Calibration Index (ICI). The ECE is a measure of the difference between the predicted probabilities and the true outcomes. The ICI is a weighted average of the absolute differences between the calibration curve and the diagonal perfectly calibrated line. Compute the ECE for both models. The calibration measures are implemented similarly to other measures in mlr3. Therefore, you need to 1) predict on the test data and then 2) score the predictions while specifying the correct calibration measure.

Hint 1:

Check ??mlr3calibration::ece on how to initialize the ECE measure within $score().

3 Tuning and Pipelines

PipeOpCalibration can be treated as any other PipeOp object. Therefore, we can use them within more complex tuning and pipeline constructs. There are many sensible options. For example, we could pass a tuned base learner to the calibrator or tune the base learner within the calibrator. Similarly, we can include a calibrated learner in a pipeline or choose to calibrate the entire pipeline. Let’s try how to connect a feature filter to a calibrator. Construct a pipeline that 1) filters the 10 most relevant features according to their information gain, 2) then fits a random forest, and 3) calibrate this pipeline with beta calibration using 5-fold CV. Express this calibrated pipeline as learner, train it on the training task and plot the calibration plot with the Platt scaled and beta-calibrated models.

Hint 1:

You may use this skeleton code for the required steps.

po_filter = po(...)
pipeline = as_learner(... %>>% ...)
pipeline_cal = as_learner(PipeOpCalibration$new(...))
pipeline_cal$id <- "Beta Calibrated Learner"
pipeline_cal$train(...)
calibrationplot(...)