library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(paradox)
Transforming the target variable often can lead to predictive improvement and is a widely used tool. Typical transformations are for example the \(\log\) transformation of the target aiming at minimizing (right) skewness, or the Box Cox and Yeo-Johnson transformations being more flexible but having a similar goal.
One option to perform, e.g., a \(\log\) transformation would be to manually transform the target prior to training a Learner
(and also predicting from it) and then manually invert this transformation via \(\exp\) after predicting from the Learner
. This is quite cumbersome, especially if a transformation and inverse transformation require information about both the training and prediction data.
In this post, we show how to do various kinds of target transformations using mlr3pipelines and explain the design of the target transformation and inversion PipeOps
.
You will:
learn how to do simple target transformations using
PipeOpTargetMutate
be introduced to the abstract base class to implement custom target transformations,
PipeOpTargetTrafo
implement a custom target transformation PipeOp,
PipeOpTargetTrafoBoxCox
As a prerequisite, you should be quite familiar with mlr3pipelines, i.e, know about the $state
field of PipeOp
s, input and output channels, as well as Graphs
. We will start with a PipeOp
for simple target transformations, PipeOpTargetMutate
.
We load the most important packages for this example.
We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented.
set.seed(7832)
::get_logger("mlr3")$set_threshold("warn") lgr
In all sections we will use the mtcars
regression task
with mpg
being a numerical, positive target:
= tsk("mtcars")
task summary(task$data(cols = task$target_names))
mpg
Min. :10.40
1st Qu.:15.43
Median :19.20
Mean :20.09
3rd Qu.:22.80
Max. :33.90
Moreover, as a Learner
we will use an ordinary linear regression learner
:
= lrn("regr.lm") learner_lm
Simple Target Transformations
The term simple refers to transformations that are given by a function of the target, relying on no other arguments (constants are of course allowed). The most prominent example is given by the \(\log\) transformation which we can later invert by applying the \(\exp\) transformation.
If you are only interested in doing such a transformation and you do not have the time to read more of this post, simply use the following syntactic sugar:
= ppl("targettrafo", graph = learner_lm)
g_ppl $param_set$values$targetmutate.trafo = function(x) log(x)
g_ppl$param_set$values$targetmutate.inverter = function(x) list(response = exp(x$response)) g_ppl
This constructs a Graph
that will \(\log\) transform the target prior to training the linear regression learner (or predicting from it) and \(\exp\) transform the target after predicting from it. Note that you can supply any other Learner
or even a whole Graph
as the graph
argument.
Now, we will go into more detail about how this actually works:
We can perform a \(\log\) transformation of our numerical, positive target, mpg
, using PipeOpTargetMutate
(by default, ppl("targettrafo")
uses this target transformation PipeOp
):
= po("targetmutate", param_vals = list(trafo = function(x) log(x))) trafo
We have to specify the trafo
parameter as a function of x
(which will internally be evaluated to be the target of the Task
): trafo = function(x) log(x))
. In principle, this is all that is needed to transform the target prior to training a Learner
(or predicting from it), i.e., if we now train this PipeOp
, we see that the target is transformed as specified:
$train(list(task))$output$data(cols = task$target_names) trafo
mpg
1: 3.044522
2: 3.044522
3: 3.126761
4: 3.063391
5: 2.928524
6: 2.895912
7: 2.660260
8: 3.194583
9: 3.126761
10: 2.954910
11: 2.879198
12: 2.797281
13: 2.850707
14: 2.721295
15: 2.341806
16: 2.341806
17: 2.687847
18: 3.478158
19: 3.414443
20: 3.523415
21: 3.068053
22: 2.740840
23: 2.721295
24: 2.587764
25: 2.954910
26: 3.306887
27: 3.258097
28: 3.414443
29: 2.760010
30: 2.980619
31: 2.708050
32: 3.063391
mpg
After having predicted from the Learner
we could then proceed to apply the inverse transformation function in a similar manner. However, in mlr3pipelines, we decided to go with a more unified design of handling target transformations. In all target transformation PipeOp
s also the inverse transformation function of the target has to be specified. Therefore, in PipeOpTargetMutate
, the parameter inverter
also has to be correctly specified:
$param_set$values$inverter = function(x) list(response = exp(x$response)) trafo
Internally, this function will be applied to the data.table
downstream of a Prediction
object without the $row_id
and $truth
columns, and we specify that the $response
column should be transformed. Note that applying the inverse transformation will typically only be done to the $response
column, because transforming standard errors or probabilities is often not straightforward.
To actually carry out the inverse transformation function after predicting from the Learner
, we then rely on PipeOpTargetInvert
. PipeOpTargetInvert
has an empty ParamSet
and its sole purpose is to apply the inverse transformation function after having predicted from a Learner
(note that this whole design of target transformations may seem somewhat over-engineered at first glance, however, we will learn of its advantages when we later move to the advanced section).
PipeOpTargetInvert
has two input channels named "fun"
and "prediction"
. During training, both take NULL
as input (because this is what a Learner
’s "output"
output and PipeOpTargetMutate
’s "fun"
output will return during training). During prediction, the "prediction"
input takes a Prediction
, and the "fun"
input takes the "fun"
output from PipeOpTargetMutate
(you may have noticed already, that PipeOpTargetMutate
has actually two outputs, "fun"
and "output"
, with "fun"
returning NULL
during training and a function during prediction, while "output"
always returns the transformed input Task
). We can see this, if we look at:
$output trafo
name train predict
1: fun NULL function
2: output Task Task
$predict(list(task)) trafo
$fun
function(inputs) {
assert_list(inputs, len = 1L, types = "Prediction")
list(private$.invert(inputs[[1L]], predict_phase_state))
}
<bytecode: 0x555be761be00>
<environment: 0x555be76215b0>
$output
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
We will talk more about such technical details in the advanced section. For now, to finally construct our target transformation pipeline, we build a Graph
:
= Graph$new()
g $add_pipeop(trafo)
g$add_pipeop(learner_lm)
g$add_pipeop(po("targetinvert")) g
Manually connecting the edges is quite cumbersome. First we connect the "output"
output of "targetmutate"
to the "input"
input of "regr.lm"
:
$add_edge(src_id = "targetmutate", dst_id = "regr.lm",
gsrc_channel = 2, dst_channel = 1)
Then we connect the "output"
output of "regr.lm"
to the "prediction"
input of "targetinvert"
:
$add_edge(src_id = "regr.lm", dst_id = "targetinvert",
gsrc_channel = 1, dst_channel = 2)
Finally, we connect the "fun"
output of "targetmutate"
to the "fun"
input of "targetinvert"
:
$add_edge(src_id = "targetmutate", dst_id = "targetinvert",
gsrc_channel = 1, dst_channel = 1)
This graph (which is conceptually the same graph as constructed via the ppl("targettrafo")
syntactic sugar above) looks like the following:
$plot(html = FALSE) g
We can then finally call $train()
and $predict()
(prior to this we wrap the Graph
in a GraphLearner
):
= GraphLearner$new(g)
gl $train(task)
gl$state gl
$model
$model$targetmutate
list()
$model$regr.lm
$model$regr.lm$model
Call:
stats::lm(formula = task$formula(), data = task$data())
Coefficients:
(Intercept) am carb cyl disp drat gear hp qsec
2.776e+00 4.738e-02 -2.012e-02 7.657e-03 4.989e-05 2.220e-02 5.925e-02 -8.964e-04 3.077e-02
vs wt
-2.874e-03 -1.723e-01
$model$regr.lm$log
Empty data.table (0 rows and 3 cols): stage,class,msg
$model$regr.lm$train_time
[1] 0.012
$model$regr.lm$param_vals
named list()
$model$regr.lm$task_hash
[1] "6ca8c90cdf732078"
$model$regr.lm$data_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$model$regr.lm$task_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$model$regr.lm$mlr3_version
[1] '0.16.1'
$model$regr.lm$train_task
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
$model$targetinvert
list()
$log
Empty data.table (0 rows and 3 cols): stage,class,msg
$train_time
[1] 0.098
$param_vals
$param_vals$targetmutate.trafo
function(x) log(x)
<bytecode: 0x555be6fb60f0>
$param_vals$targetmutate.inverter
function(x) list(response = exp(x$response))
$task_hash
[1] "58a137d2055e8406"
$data_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$task_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$mlr3_version
[1] '0.16.1'
$train_task
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
$predict(task) gl
<PredictionRegr> for 32 observations:
row_ids truth response
1 21.0 21.67976
2 21.0 21.10831
3 22.8 25.73690
---
30 19.7 19.58533
31 15.0 14.11015
32 21.4 23.11105
and contrast this with $train()
and $predict()
of the naive linear regression learner (also look at the estimated coefficients of the linear regression contained in $state$model
):
$train(task)
learner_lm$state learner_lm
$model
Call:
stats::lm(formula = task$formula(), data = task$data())
Coefficients:
(Intercept) am carb cyl disp drat gear hp qsec
12.30337 2.52023 -0.19942 -0.11144 0.01334 0.78711 0.65541 -0.02148 0.82104
vs wt
0.31776 -3.71530
$log
Empty data.table (0 rows and 3 cols): stage,class,msg
$train_time
[1] 0.004
$param_vals
named list()
$task_hash
[1] "58a137d2055e8406"
$data_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$task_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$mlr3_version
[1] '0.16.1'
$train_task
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
$predict(task) learner_lm
<PredictionRegr> for 32 observations:
row_ids truth response
1 21.0 22.59951
2 21.0 22.11189
3 22.8 26.25064
---
30 19.7 19.69383
31 15.0 13.94112
32 21.4 24.36827
You should continue reading, if you are interested in more advanced target transformations, i.e., where the transformation and inverse transformation require information about both the training and prediction data.
First we will introduce the abstract base class for doing target transformations, PipeOpTargetTrafo
, from which PipeOpTargetMutate
inherits.
Abstract Base Class: PipeOpTargetTrafo
No matter how “complicated” the actual target transformation and inverse transformation may be, applying the inverse transformation function after having predicted from a Learner
will always be done via PipeOpTargetInvert
(as already outlined above, PipeOpTargetInvert
has an empty ParamSet
and its sole purpose is to apply the inverse transformation function after having predicted from a Learner
). All Graph
s for doing target transformations will therefore look similar like the simple one above, i.e., a target transformation PipeOp
followed by some Learner
or a whole Graph
, followed by PipeOpTargetInvert
. Therefore, using ppl("targettrafo")
to construct such Graph
s is highly recommended.
To allow for more advanced target transformations, we now have a closer look at the abstract base class, PipeOpTargetTrafo
:
PipeOpTargetTrafo
has one input channel, named "input"
taking a Task
both during training and prediction. It’s two output channels are named "fun"
and "output"
. During training "fun"
returns NULL
and during prediction "fun"
returns a function that will be used by PipeOpTargetInvert
to perform the inverse target transformation on PipeOpTargetInvert
’s "prediction"
input. "output"
returns the modified input Task
both during training and prediction.
Subclasses can overload up to four functions:
.get_state()
takes the inputTask
and returns alist()
which will internally be used to set the$state
. Typically it is sensible to make use of the$state
during.transform()
and.train_invert()
. The base implementation returnslist()
and should be overloaded if setting the state is desired..transform()
takes the inputTask
and returns a modifiedTask
(i.e., theTask
with the transformed target). This is the main function for doing the actual target transformation. Note that.get_state()
is evaluated a single time during training right before.transform()
and therefore, you can rely on the$state
that has been set. To update the inputTask
with respect to the transformed target, subclasses should make use of theconvert_task()
function and drop the original target from theTask
..transform()
also accepts aphase
argument that will receive"train"
during training and"predict"
during prediction. This can be used to enable different behavior during training and prediction..transform()
should always be overloaded by subclasses..train_invert()
takes the inputTask
and returns apredict_phase_state
object. This can be anything. Note that.train_invert()
should not modify the inputTask
. The base implementation returns a list with a single argument, the$truth
column of the inputTask
and should be overloaded if a more training-phase-dependent state is desired..invert()
takes aPrediction
and apredict_phase_state
object as inputs and returns aPrediction
. This is the main function for specifying the actual inverse target transformation that will later be carried out byPipeOpTargetInvert
. Internally a private helper function ,.invert_help()
will construct the function that will be returned by the"fun"
output ofPipeOpTargetTrafo
so thatPipeOpTargetInvert
can later simply dispatch this inverse target transformation on its"prediction"
input.
The supposed workflow of a class inherited from PipeOpTargetTrafo
is given in the following figure:
To solidify our understanding we will design a new target transformation PipeOp
in the next section: PipeOpTargetTrafoBoxCox
Hands on: PipeOpTargetTrafoBoxCox
library(R6)
The Box-Cox transformation of a target \(y_{i}\) is given as:
\[y_{i}(\lambda) = \begin{cases} \frac{y_{i}^{\lambda} - 1}{\lambda} & \text{if}~\lambda \neq 0; \\ \log(y_{i}) & \text{if}~\lambda = 0 \end{cases}\]
mlr3pipelines already supports the Box-Cox transformation for numerical, positive features, see ?PipeOpBoxCox
.
Here we will design a PipeOp
to apply the Box-Cox transformation as a target transformation. The \(\lambda\) parameter of the transformation is estimated during training and used for both the training and prediction transformation. After predicting from a Learner
we will as always apply the inverse transformation function. To do the actual transformation we will use bestNormalize::boxcox()
.
First, we inherit from PipeOpTargetTrafo
and overload the initialize()
function:
= R6Class("PipeOpTargetTrafoBoxCox",
PipeOpTargetTrafoBoxCox inherit = PipeOpTargetTrafo,
public = list(
initialize = function(id = "targettrafoboxcox", param_vals = list()) {
= ps(
param_set standardize = p_lgl(default = TRUE, tags = c("train", "boxcox")),
eps = p_dbl(default = 0.001, lower = 0, tags = c("train", "boxcox")),
lower = p_dbl(default = -1L, tags = c("train", "boxcox")),
upper = p_dbl(default = 2L, tags = c("train", "boxcox"))
)$initialize(id = id, param_set = param_set, param_vals = param_vals,
superpackages = "bestNormalize", task_type_in = "TaskRegr",
task_type_out = "TaskRegr")
}
),private = list(
.get_state = function(task) {
...
},
.transform = function(task, phase) {
...
},
.train_invert = function(task) {
...
},
.invert = function(prediction, predict_phase_state) {
...
}
) )
As parameters, we allow "standardize"
(whether to center and scale the transformed values to attempt a standard normal distribution), "eps"
(tolerance parameter to identify if the \(\lambda\) parameter is equal to zero), "lower"
(lower value for the estimation of the \(\lambda\) parameter) and "upper"
(upper value for the estimation of the \(\lambda\) parameter). Note that we set task_type_in = "TaskRegr"
and task_type_out = "TaskRegr"
to specify that this PipeOp
only works for regression Tasks
.
Second, we overload the four functions as mentioned above.
We start with .get_state()
. We extract the target and apply the Box-Cox transformation to the target. This yields an object of class "boxcox"
which we will wrap in a list()
and set as the $state
(bc$x.t = NULL
and bc$x = NULL
is done to save some memory because we do not need the transformed original data and original data later):
= function(task) {
.get_state = task$data(cols = task$target_names)[[1L]]
target = mlr3misc::invoke(bestNormalize::boxcox, target,
bc .args = self$param_set$get_values(tags = "boxcox"))
$x.t = NULL
bc$x = NULL
bclist(bc = bc)
},
Next, we tackle .transform()
. This is quite straightforward, because objects of class "boxcox"
have their own predict method which we can use here to carry out the actual Box-Cox transformation based on the learned \(\lambda\) parameter as stored in the "boxcox"
object in the $state
(both during training and prediction). We then rename the target, add it to the task and finally update the task with respect to this new target:
= function(task, phase) {
.transform = task$data(cols = task$target_names)[[1L]]
target = as.data.table(predict(self$state$bc, newdata = target))
new_target colnames(new_target) = paste0(task$target_names, ".bc")
$cbind(new_target)
taskconvert_task(task, target = colnames(new_target),
drop_original_target = TRUE)
},
Time to overload .train_invert()
. This is even more straightforward, because the prediction method for objects of class "boxcox"
directly allows for inverting the transformation via setting the argument inverse = TRUE
. Therefore, we only need the "boxcox"
object stored in the $state
along the $truth
column of the input Task
(remember that this list will later be available as the predict_phase_state
object):
= function(task) {
.train_invert list(truth = task$truth(), bc = self$state$bc)
},
Finally, we overload .invert()
. We extract the truth from the predict_phase_state
and the response from the Prediction
. We then apply the inverse Box-Cox transformation to the response based on the \(\lambda\) parameter and the mean and standard deviation learned during training, relying on the predict_phase_state
object. Finally, we construct a new Prediction
object:
= function(prediction, predict_phase_state) {
.invert = predict_phase_state$truth
truth = predict(predict_phase_state$bc, newdata = prediction$response,
response inverse = TRUE)
$new(row_ids = prediction$row_ids, truth = truth,
PredictionRegrresponse = response)
}
Note that this PipeOp
is ill-equipped to handle the case of predict_type = "se"
, i.e., we always only return a response
prediction (as outlined above, this is the case for most target transformations, because transforming standard errors or probabilities of a prediction is often not straightforward). We could of course check whether the predict_type
is set to "se"
and if this is the case, return NA
as the standard errors.
To construct our final target transformation Graph
with our linear regression learner, we again simply make use of ppl("targettrafo")
:
= ppl("targettrafo", graph = learner_lm,
g_bc trafo_pipeop = PipeOpTargetTrafoBoxCox$new())
The following plot should already look quite familiar:
$plot(html = FALSE) g_bc
Finally we $train()
and $predict()
on the task (again, we wrap the Graph
in a GraphLearner
):
= GraphLearner$new(g_bc)
gl_bc $train(task)
gl_bc$state gl_bc
$model
$model$regr.lm
$model$regr.lm$model
Call:
stats::lm(formula = task$formula(), data = task$data())
Coefficients:
(Intercept) am carb cyl disp drat gear hp qsec
-0.6272999 0.1670950 -0.0663126 0.0237529 0.0002376 0.0759944 0.1963335 -0.0030367 0.1043210
vs wt
-0.0080166 -0.5800635
$model$regr.lm$log
Empty data.table (0 rows and 3 cols): stage,class,msg
$model$regr.lm$train_time
[1] 0.007
$model$regr.lm$param_vals
named list()
$model$regr.lm$task_hash
[1] "612ab4e0ad596159"
$model$regr.lm$data_prototype
Empty data.table (0 rows and 11 cols): mpg.bc,am,carb,cyl,disp,drat...
$model$regr.lm$task_prototype
Empty data.table (0 rows and 11 cols): mpg.bc,am,carb,cyl,disp,drat...
$model$regr.lm$mlr3_version
[1] '0.16.1'
$model$regr.lm$train_task
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg.bc
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
$model$targettrafoboxcox
$model$targettrafoboxcox$bc
Standardized Box Cox Transformation with 32 nonmissing obs.:
Estimated statistics:
- lambda = 0.02955701
- mean (before standardization) = 3.092016
- sd (before standardization) = 0.324959
$model$targetinvert
list()
$log
Empty data.table (0 rows and 3 cols): stage,class,msg
$train_time
[1] 0.075
$param_vals
named list()
$task_hash
[1] "58a137d2055e8406"
$data_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$task_prototype
Empty data.table (0 rows and 11 cols): mpg,am,carb,cyl,disp,drat...
$mlr3_version
[1] '0.16.1'
$train_task
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
- dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
$predict(task) gl_bc
<PredictionRegr> for 32 observations:
row_ids truth response
1 21.0 21.70854
2 21.0 21.13946
3 22.8 25.75242
---
30 19.7 19.58934
31 15.0 14.10658
32 21.4 23.15263
We could now proceed to benchmark our different target transformations:
= benchmark_grid(list(task), learners = list(learner_lm, gl, gl_bc),
bg resamplings = list(rsmp("cv", folds = 10)))
= benchmark(bg) bmr
$aggregate(msr("regr.mse")) bmr
nr task_id learner_id resampling_id iters regr.mse
1: 1 mtcars regr.lm cv 10 11.866071
2: 2 mtcars targetmutate.regr.lm.targetinvert cv 10 7.793303
3: 3 mtcars targettrafoboxcox.regr.lm.targetinvert cv 10 8.230192
Hidden columns: resample_result