```
set.seed(7832)
library(mlr3verse)
```

# Scope

In this post, we use early stopping to reduce overfitting when training an `XGBoost model`

. We start with a short recap on early stopping and overfitting. After that, we use the early stopping mechanism of XGBoost and train a model on the `Spam Classification`

data set. Finally we show how to simultaneously tune hyperparameters and use early stopping. The reader should be familiar with tuning in the mlr3 ecosystem.

# Early Stopping

Early stopping is a technique used to reduce overfitting when fitting a model in an iterative process. Overfitting occurs when a model fits increasingly to the training data but the performance on unseen data decreases. This means the model’s training error decreases, while its test performance deteriorates. When using early stopping, the performance is monitored on a test set, and the training stops when performance decreases in a specific number of iterations.

# XGBoost with Early Stopping

We initialize the random number generator with a fixed seed for reproducibility. The mlr3verse package provides all functions required for this example.

When training an XGBoost model, we can use early stopping to find the optimal number of boosting rounds. The `partition()`

function splits the observations of the task into two disjoint sets. We use 80% of observations to train the model and the remaining 20% as the test set to monitor the performance.

```
= tsk("spam")
task = partition(task, ratio = 0.8)
split $set_row_roles(split$test, "test") task
```

The `early_stopping_set`

parameter controls which set is used to monitor the performance. Additionally, we need to define the range in which the performance must increase with `early_stopping_rounds`

and the maximum number of boosting rounds with `nrounds`

. In this example, the training is stopped when the classification error is not decreasing for 100 rounds or 1000 rounds are reached.

```
= lrn("classif.xgboost",
learner nrounds = 1000,
early_stopping_rounds = 100,
early_stopping_set = "test",
eval_metric = "error"
)
```

We train the learner with early stopping.

`$train(task) learner`

The `$evaluation_log`

of the model stores the performance scores on the training and test set. Figure 1 shows that the classification error on the training set decreases, whereas the error on the test set increases after 20 rounds.

## Code

```
library(ggplot2)
library(data.table)
= melt(
data $model$evaluation_log,
learnerid.vars = "iter",
variable.name = "set",
value.name = "error"
)
ggplot(data, aes(x = iter, y = error, group = set)) +
geom_line(aes(color = set)) +
geom_vline(aes(xintercept = learner$model$best_iteration), color = "grey") +
scale_colour_manual(values=c("#f8766d", "#00b0f6"), labels = c("Train", "Test")) +
labs(x = "Rounds", y = "Classification Error", color = "Set") +
theme_minimal()
```

The slot `$best_iteration`

contains the optimal number of boosting rounds.

`$model$best_iteration learner`

`[1] 20`

Note that, `learner$predict()`

will use the model from the last iteration, not the best one. See the next section on how to fit a model with the optimal number of boosting rounds and hyperparameter configuration.

# Tuning

In this section, we want to tune the hyperparameters of an XGBoost model and find the optimal number of boosting rounds in one go. For this, we need the `early stopping callback`

which handles early stopping during the tuning process. The performance of a hyperparameter configuration is evaluated with a resampling strategy while tuning e.g. 3-fold cross-validation. In each resampling iteration, a new XGBoost model is trained and early stopping is used to find the optimal number of boosting rounds. This results in three different optimal numbers of boosting rounds for one hyperparameter configuration when applying 3-fold cross-validation. The callback picks the maximum of the three values and writes it to the archive. It uses the maximum value because the final model is fitted on the complete data set. Now let’s start with a practical example.

First, we load the XGBoost learner and set the early stopping parameters.

```
= lrn("classif.xgboost",
learner nrounds = 1000,
early_stopping_rounds = 100,
early_stopping_set = "test"
)
```

Next, we load a predefined tuning space from the mlr3tuningspaces package. The tuning space includes the most commonly tuned parameters of XGBoost.

```
= lts("classif.xgboost.default")
tuning_space as.data.table(tuning_space)
```

```
id lower upper logscale
1: eta 1e-04 1 TRUE
2: nrounds 1e+00 5000 FALSE
3: max_depth 1e+00 20 FALSE
4: colsample_bytree 1e-01 1 FALSE
5: colsample_bylevel 1e-01 1 FALSE
6: lambda 1e-03 1000 TRUE
7: alpha 1e-03 1000 TRUE
8: subsample 1e-01 1 FALSE
```

We argument the learner with the tuning space.

`= lts(learner) learner `

The default tuning space contains the `nrounds`

hyperparameter. We have to overwrite it with an upper bound for early stopping.

`$param_set$set_values(nrounds = 1000) learner`

We run a small batch of random hyperparameter configurations.

```
= tune(
instance method = "random_search",
task = task,
learner = learner,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.ce"),
term_evals = 4,
batch_size = 2,
callbacks = clbk("mlr3tuning.early_stopping")
)
```

We can see that the optimal number of boosting rounds (`max_nrounds`

) strongly depends on the other hyperparameters.

`as.data.table(instance$archive)[, list(batch_nr, max_nrounds, eta, max_depth, colsample_bytree, colsample_bylevel, lambda, alpha, subsample)]`

```
batch_nr max_nrounds eta max_depth colsample_bytree colsample_bylevel lambda alpha subsample
1: 1 273 -1.04605873 1 0.8918211 0.1841578 -6.2828642 -2.748495 0.6890264
2: 1 93 -0.01516098 19 0.5108089 0.2405859 -0.8666842 4.442711 0.5464676
3: 2 1000 -8.46723302 13 0.8662932 0.5460656 -5.7251541 -3.850319 0.2734089
4: 2 1000 -7.04702376 8 0.6054186 0.5921445 -4.8507050 -2.466443 0.5887968
```

In the best hyperparameter configuration, the value of `nrounds`

is replaced by `max_nrounds`

and early stopping is deactivated.

`$result_learner_param_vals instance`

```
$nrounds
[1] 273
$nthread
[1] 1
$verbose
[1] 0
$early_stopping_set
[1] "none"
$eta
[1] 0.3513197
$max_depth
[1] 1
$colsample_bytree
[1] 0.8918211
$colsample_bylevel
[1] 0.1841578
$lambda
[1] 0.001868043
$alpha
[1] 0.06402412
$subsample
[1] 0.6890264
```

Finally, fit the final model on the complete data set.

```
= lrn("classif.xgboost")
learner $param_set$values = instance$result_learner_param_vals
learner$train(task) learner
```

The trained model can now be used to make predictions on new data.

We can also use the `AutoTuner`

to get a tuned XGBoost model. Note that, early stopping is deactivated when the final model is fitted.