library(mlr3verse)
= tsk("german_credit")
task $positive = "good" task
Goal
Our goal for this exercise sheet is to understand how to apply and work with XGBoost. The XGBoost algorithm has a large range of hyperparameters. We learn specifically how to tune these hyperparameters to optimize our XGBoost model for the task at hand.
German Credit Dataset
As in previous exercises, we use the German credit dataset of Prof. Dr. Hans Hoffman of the University of Hamburg in 1994. By using XGBoost, we want to classify people as a good or bad credit risk based on 20 personal, demographic and financial features. The dataset is available at the UCI repository as Statlog (German Credit Data) Data Set.
Preprocessing
To apply the XGBoost algorithm to the credit
dataset, categorical features need to be converted into numeric features e.g. using one-hot-encoding. We use a factor encoding PipeOp
from mlr3pipelines
to do so.
First, we setup a classification task:
Next, we can initialize a factor encoding and apply it to the task at hand.
= po("encode")
poe = poe$train(list(task))[[1]] task
1 XGBoost Learner
1.1 Initialize an XGBoost Learner
Initialize a XGBoost mlr3
learner with 100 iterations. Make sure that that you have installed the xgboost
R package.
Details on iterations:
The number of iterations must always be chosen by the user, since the hyperparameter has no proper default value in mlr3
.
“No proper default value” means that mlr3
has an adjusted default of 1 iteration to avoid errors when constructing the learner. One single iteration is, in general, not a good default, since we only conduct a single boosting step.
There is a trade-off between underfitting (not enough iterations) and overfitting (too many iterations). Therefore, it is always better to tune such a hyperparameter. In this exercise, we chose 100 iterations because we believe it is an upper bound for the number of iterations. We will later conduct early stopping to avoid overfitting.
Hint 1:
The number of iterations can be specified via the nrounds
hyperparameter of the classif.xgboost
learner, set this hyperparameter to 100
.
Hint 2:
= lrn(..., nrounds = ...) xgboost_lrn
1.2 Performance Assessment using Cross-validation
Use 5-fold cross-validation to estimate the generalization error of the XGBoost learner with 100 boosting iterations on the one-hot-encoded credit task. Measure the performance of the learner using the classification error. Set up a seed to make your results reproducible (e.g., set.seed(8002L)
).
Hint 1:
Specifically, you need to conduct three steps:
- Specify a
Resampling
object usingrsmp()
. - Use this object together with the task and learner specified above as an input to the
resample()
method. - Measure the performance with the
$aggregate()
method of the resultingResampleResult
object.
Hint 2:
set.seed(8002L)
= rsmp("cv", ...)
resampling = resample(task = ..., learner = ..., resampling = ...)
rr $aggregate() rr
2 Hyperparameters
2.1 Overview of Hyperparameters
Apart from the number of iterations (nrounds
), the XGBoost learner has a several other hyperparameters which were kept to their default values in the previous exercise. Extract an overview of all hyperparameters from the initalized XGBoost learner (previous exercise) as well as its default values.
Given the extracted hyperparameter list above and the help page of xgboost
(?xgboost
), answer the following questions:
- Does the learner rely on a tree or a linear booster by default?
- Do more hyperparameters exist for the tree or the linear booster?
- What do
max_depth
andeta
mean and what are their default values? - Does a larger value for
eta
imply a larger value fornrounds
?
Hint 1:
The hyperparameters and their default values could be extracted by the $param_set
field of the XGBoost learner. Alternatively, you could call the help page of LearnerClassifXgboost
.
Hint 2:
You can answer all questions concerning defaults with the output of the $param_set
. A description of the hyperparameters could be found on the xgboost
help page (?xgboost
). The help page also offers an answer to the last question concerning the connection between eta
and nrounds
.
2.2 Tune Hyperparameters
Tune the the depth of tree of the xgboost
learner on the German credit data using random search
- with the search space for
max_depth
between 1 and 8 and foreta
between 0.2 and 0.4 - with 20 evaluations as termination criterion
- the classification error
msr("classif.ce")
as performance measure - 3-fold CV as resampling strategy.
Set a seed for reproducibility (e.g., set.seed(8002L)
).
Hint 1:
Specifically, you should conduct the following steps:
- Setup a search space with
ps()
consisting of ap_int()
formax_depth
andp_dbl()
foreta
. - Setup the classification error as a tuning measure with
msr()
. - Initialize cross-validation as the resampling strategy using
rsmp()
. - Setup 10 evaluations as the termination criterion using
trm()
. - Initialize a
TuningInstanceSingleCrit
object usingti()
and the objects produced in steps 1.-4. as well as the task and learner as an input. - Define random search as the tuner object using
tnr()
. - Call the
$optimize()
method of the tuner object with the initializedTuningInstanceSingleCrit
as an input.
Hint 2:
set.seed(8002L)
= ps(
search_space max_depth = ...(1L, 8L),
eta = ...(0.2, 0.4)
)= msr("classif....")
measure = rsmp("cv", folds = ...)
resampling = trm("evals", n_evals = ...)
terminator
= ti(
instance_random task = ...,
learner = ...,
measure = ...,
resampling = ...,
terminator = ...,
search_space = ...
)
= tnr(...)
tuner_random $optimize(...) tuner_random
2.3 Inspect the the Best Performing Setup
Which tree depth was the best performing one?
Hint 1:
Inspect the tuned instance (of class TuningInstanceSingleCrit
, it was the input to $optimize()
). Look, for example, at the $result
field.
Hint 2:
$result instance_random
3 Early Stopping
3.1 Set up an XGBoost Learner with Early Stopping
Now that we derived the best hyperparameter for the maximum depth and eta, we could train our final model. To avoid overfitting we conduct early stopping - meaning that the algorithm stops as soon as a the performance does not improve for a given number of rounds to avoid overfitting. The performance for stopping should be assessed via a validation data set.
Set up an XGBoost learner with the following hyperparameters:
max_depth
andeta
set to the best configurations according to the previous tuning task.nrounds
set to 100L.- The number of early stopping rounds set to 5 (this could be tuned, as well, but we simplify things) in order to stop earlier if there was no improvement in the previous 5 iterations.
library("xgboost")
set.seed(2001L)
Hint 1:
Specify the hyperparameters within lrn()
. The number of rounds could be specified with early_stopping_rounds
.
Hint 2:
lrn("...", nrounds = ...,
max_depth = instance_random$result$...,
eta = instance_random$result$...,
early_stopping_rounds = ....
)
3.2 Training on Credit Data
Train the XGBoost learner from the previous exercise on the credit data set How many iterations were conducted before the boosting algorithm stopped?
Hint 1:
By calling $train()
a model is trained which can be accessed via $model
. This model has a field $niter
- the number of conducted iterations.
Hint 2:
$train(...)
xgboost$...$niter xgboost
4 Extra: Nested Resampling
To receive an unbiased performance estimate when tuning hyperparameters, conduct nested resampling with
- 3-fold cross-validation for the outer and inner resampling loop.
- a search space for
max_depth
between 1 and 8 andeta
between 0.2 and 0.4. - random search with 20 evaluations
- the classification error
msr("classif.ce")
as performance measure.
Extract the performance estimate on the outer resampling folds.
Hint 1:
Specifically, you need to conduct the following steps:
- Set up an XGBoost learner.
- Initialize a search space for
max_depth
andeta
usingps()
. - Initialize an
AutoTuner
with the xgboost model from the previous exercise an an input. TheAutoTuner
reflects the inner resampling loop. It should be initialized for 3-fold CV, random search with 20 evaluations and the classification error as performance measure. - Specify a
Resampling
object usingrsmp()
. - Use this object with the credit task and
AutoTuner
as an input toresample()
. - Extract the results via
$aggregate()
.
Important: Early stopping requires a validation set. But AutoTuner
uses internal resampling instead of splitting the data manually, and does not provide a “validate” set to the learner by default. That is why we should not use early stopping here.
Hint 2:
= lrn(...)
xgboost_lrn3
= ps(
tune_ps max_depth = p_int(..., ...),
eta = p_dbl(..., ...)
)
= auto_tuner(xgboost_lrn2,
at resampling = rsmp("cv", folds = ...),
search_space = ...,
measure = msr("..."),
terminator = trm("none"),
tuner = tnr("...", resolution = 5L))
= rsmp("...", folds = ...)
resampling
set.seed(8002L)
= resample(task = ..., learner = ..., resampling = resampling)
nestrr
$aggregate() nestrr
Summary
In this exercise sheet, we learned how to apply a XGBoost learner to the credit data set By using resampling, we estimated the performance. XGBoost has a lot of hyperparameters and we only had a closer look on two of them. We also saw how early stopping could be facilitated which should help to avoid overfitting of the XGBoost model.
Interestingly, we obtained best results, when we used 100 iterations, without tuning or early stopping. However, performance differences were quite small - if we set a different seed, we might see a different ranking. Furthermore, we could extend our tuning search space such that more hyperparameters are considered to increase overall performance of the learner for the task at hand. Of course, this also requires more budget for the tuning (e.g., more evaluations of random search).