Feature Selection on the Titanic Data Set

mlr3fselect optimization feature selection nested resampling titanic data set classification

Short introduction to feature selection with mlr3fselect.

Marc Becker
01-08-2021

Introduction

In this tutorial, we introduce the mlr3fselect package by comparing feature selection methods on the Titanic disaster data set. The objective of feature selection is to enhance the interpretability of models, speed up the learning process and increase the predictive performance.

We load the mlr3verse package which pulls in the most important packages for this example.

We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented.

set.seed(7832)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

Titanic Data Set

The Titanic data set contains data for 887 Titanic passengers, including whether they survived when the Titanic sank. Our goal will be to predict the survival of the Titanic passengers.

After loading the data set from the mlr3data package, we impute the missing age values with the median age of the passengers, set missing embarked values to "s" and remove character features. We could use feature engineering to create new features from the character features, however we want to focus on feature selection in this tutorial.

In addition to the survived column, the reduced data set contains the following attributes for each passenger:

Feature Description
age Age
sex Sex
sib_sp Number of siblings / spouses aboard
parch Number of parents / children aboard
fare Amount paid for the ticket
pc_class Passenger class
embarked Port of embarkation
library(mlr3data)

data("titanic", package = "mlr3data")
titanic$age[is.na(titanic$age)] = median(titanic$age, na.rm = TRUE)
titanic$embarked[is.na(titanic$embarked)] = "S"
titanic$ticket = NULL
titanic$name = NULL
titanic$cabin = NULL
titanic = titanic[!is.na(titanic$survived),]

We construct a binary classification task.

task = as_task_classif(titanic, target = "survived", positive = "yes")

Model

We use the logistic regression learner provided by the mlr3learners package.

library(mlr3learners)

learner = lrn("classif.log_reg")

To evaluate the predictive performance, we choose a 3-fold cross-validation and the classification error as the measure.

resampling = rsmp("cv", folds = 3)
measure = msr("classif.ce")

resampling$instantiate(task)

Classes

The FSelectInstanceSingleCrit class specifies a general feature selection scenario. It includes the ObjectiveFSelect object that encodes the black box objective function which is optimized by a feature selection algorithm. The evaluated feature sets are stored in an ArchiveFSelect object. The archive provides a method for querying the best performing feature set.

The Terminator classes determine when to stop the feature selection. In this example we choose a terminator that stops the feature selection after 10 seconds. The sugar functions trm() and trms() can be used to retrieve terminators from the mlr_terminators dictionary.

terminator = trm("run_time", secs = 10)
FSelectInstanceSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = terminator)
<FSelectInstanceSingleCrit>
* State:  Not optimized
* Objective: <ObjectiveFSelect:classif.log_reg_on_titanic>
* Search Space:
<ParamSet>
         id    class lower upper nlevels        default value
1:      age ParamLgl    NA    NA       2 <NoDefault[3]>      
2: embarked ParamLgl    NA    NA       2 <NoDefault[3]>      
3:     fare ParamLgl    NA    NA       2 <NoDefault[3]>      
4:    parch ParamLgl    NA    NA       2 <NoDefault[3]>      
5:   pclass ParamLgl    NA    NA       2 <NoDefault[3]>      
6:      sex ParamLgl    NA    NA       2 <NoDefault[3]>      
7:   sib_sp ParamLgl    NA    NA       2 <NoDefault[3]>      
* Terminator: <TerminatorRunTime>
* Terminated: FALSE
* Archive:
<ArchiveFSelect>
Null data.table (0 rows and 0 cols)

The FSelector subclasses describe the feature selection strategy. The sugar function fs() can be used to retrieve feature selection algorithms from the mlr_fselectors dictionary.

mlr_fselectors
<DictionaryFSelect> with 7 stored values
Keys: design_points, exhaustive_search, genetic_search, random_search, rfe, sequential,
  shadow_variable_search

Random search randomly draws feature sets and evaluates them in batches. We retrieve the FSelectorRandomSearch class with the fs() sugar function and choose TerminatorEvals. We set the n_evals parameter to 10 which means that 10 feature sets are evaluated.

terminator = trm("evals", n_evals = 10)
instance = FSelectInstanceSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = terminator)
fselector = fs("random_search", batch_size = 5)

The feature selection is started by passing the FSelectInstanceSingleCrit object to the $optimize() method of FSelectorRandomSearch which generates the feature sets. These features set are internally passed to the $eval_batch() method of FSelectInstanceSingleCrit which evaluates them with the objective function and stores the results in the archive. This general interaction between the objects of mlr3fselect stays the same for the different feature selection methods. However, the way how new feature sets are generated differs depending on the chosen FSelector subclass.

fselector$optimize(instance)
    age embarked fare parch pclass  sex sib_sp                         features classif.ce
1: TRUE    FALSE TRUE  TRUE   TRUE TRUE   TRUE age,fare,parch,pclass,sex,sib_sp  0.2020202

The ArchiveFSelect stores a data.table::data.table() which consists of the evaluated feature sets and the corresponding estimated predictive performances.

as.data.table(instance$archive)
age embarked fare parch pclass sex sib_sp classif.ce timestamp batch_nr
TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.2031425 2022-02-28 08:57:17 1
TRUE FALSE FALSE FALSE FALSE FALSE TRUE 0.3838384 2022-02-28 08:57:17 1
FALSE FALSE FALSE TRUE FALSE FALSE TRUE 0.3804714 2022-02-28 08:57:17 1
FALSE FALSE TRUE FALSE FALSE FALSE FALSE 0.3288440 2022-02-28 08:57:17 1
FALSE FALSE TRUE FALSE FALSE TRUE FALSE 0.2188552 2022-02-28 08:57:17 1
FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3209877 2022-02-28 08:57:19 2
TRUE FALSE FALSE FALSE FALSE FALSE TRUE 0.3838384 2022-02-28 08:57:19 2
TRUE FALSE TRUE TRUE TRUE TRUE TRUE 0.2020202 2022-02-28 08:57:19 2
TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.2031425 2022-02-28 08:57:19 2
TRUE FALSE TRUE TRUE FALSE FALSE FALSE 0.3389450 2022-02-28 08:57:19 2

The associated resampling iterations can be accessed in the BenchmarkResult by calling

instance$archive$benchmark_result
<BenchmarkResult> of 30 rows with 10 resampling runs
 nr task_id             learner_id resampling_id iters warnings errors
  1 titanic select.classif.log_reg            cv     3        0      0
  2 titanic select.classif.log_reg            cv     3        0      0
  3 titanic select.classif.log_reg            cv     3        0      0
  4 titanic select.classif.log_reg            cv     3        0      0
  5 titanic select.classif.log_reg            cv     3        0      0
  6 titanic select.classif.log_reg            cv     3        0      0
  7 titanic select.classif.log_reg            cv     3        0      0
  8 titanic select.classif.log_reg            cv     3        0      0
  9 titanic select.classif.log_reg            cv     3        0      0
 10 titanic select.classif.log_reg            cv     3        0      0

We retrieve the best performing feature set with

instance$result
    age embarked fare parch pclass  sex sib_sp                         features classif.ce
1: TRUE    FALSE TRUE  TRUE   TRUE TRUE   TRUE age,fare,parch,pclass,sex,sib_sp  0.2020202

Sequential forward selection

We try sequential forward selection. We chose TerminatorStagnation that stops the feature selection if the predictive performance does not increase anymore.

terminator = trm("stagnation", iters = 5)
instance = FSelectInstanceSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = terminator)

fselector = fs("sequential")
fselector$optimize(instance)
     age embarked  fare parch pclass  sex sib_sp                features classif.ce
1: FALSE    FALSE FALSE  TRUE   TRUE TRUE   TRUE parch,pclass,sex,sib_sp  0.1964085

The FSelectorSequential object has a special method for displaying the optimization path of the sequential feature selection.

fselector$optimization_path(instance)
    age embarked  fare parch pclass   sex sib_sp classif.ce batch_nr
1: TRUE    FALSE FALSE FALSE  FALSE FALSE  FALSE  0.3838384        1
2: TRUE    FALSE FALSE FALSE  FALSE  TRUE  FALSE  0.2132435        2
3: TRUE    FALSE FALSE FALSE  FALSE  TRUE   TRUE  0.2087542        3
4: TRUE    FALSE FALSE FALSE   TRUE  TRUE   TRUE  0.2143659        4
5: TRUE    FALSE FALSE  TRUE   TRUE  TRUE   TRUE  0.2065095        5
6: TRUE    FALSE  TRUE  TRUE   TRUE  TRUE   TRUE  0.2020202        6

Recursive feature elimination

Recursive feature elimination utilizes the $importance() method of learners. In each iteration the feature(s) with the lowest importance score is dropped. We choose the non-recursive algorithm (recursive = FALSE) which calculates the feature importance once on the complete feature set. The recursive version (recursive = TRUE) recomputes the feature importance on the reduced feature set in every iteration.

learner = lrn("classif.ranger", importance = "impurity")
terminator = trm("none")
instance = FSelectInstanceSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = terminator,
  store_models = TRUE)

fselector = fs("rfe", recursive = FALSE)
fselector$optimize(instance)
    age embarked fare parch pclass  sex sib_sp                               features classif.ce
1: TRUE     TRUE TRUE  TRUE   TRUE TRUE   TRUE age,embarked,fare,parch,pclass,sex,...  0.1694725

We access the results.

as.data.table(instance$archive, exclude_columns = c("runtime_learners", "timestamp", "batch_nr", "resample_result", "uhash"))
age embarked fare parch pclass sex sib_sp classif.ce importance
TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.1694725 68.711046, 45.345443, 37.314977, 22.890378, 11.733529, 9.171937, 8.422415
TRUE FALSE TRUE FALSE FALSE TRUE FALSE 0.2143659 68.71105, 45.34544, 37.31498
FALSE FALSE FALSE FALSE FALSE TRUE FALSE 0.2132435 68.71105

Nested resampling

It is a common mistake to report the predictive performance estimated on resampling sets during the feature selection as the performance that can be expected from the combined feature selection and model training. The repeated evaluation of the model might leak information about the test sets into the model and thus leads to over-fitting and over-optimistic performance results. Nested resampling uses an outer and inner resampling to separate the feature selection from the performance estimation of the model. We can use the AutoFSelector class for running nested resampling. The AutoFSelector essentially combines a given Learner and feature selection method into a Learner with internal automatic feature selection. The inner resampling loop that is used to determine the best feature set is conducted internally each time the AutoFSelector Learner object is trained.

resampling_inner = rsmp("cv", folds = 5)
measure = msr("classif.ce")

at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs("sequential"),
  store_models = TRUE)

We put the AutoFSelector into a resample() call to get the outer resampling loop.

resampling_outer = rsmp("cv", folds = 3)

rr = resample(task, at, resampling_outer, store_models = TRUE)

The aggregated performance of all outer resampling iterations is the unbiased predictive performance we can expected from the logistic regression model with an optimized feature set found by sequential selection.

rr$aggregate()
classif.ce 
 0.1840629 

We check whether the feature sets that were selected in the inner resampling are stable. The selected feature sets should not differ too much. We might observe unstable models in this example because the small data set and the low number of resampling iterations might introduces too much randomness. Usually, we aim for the selection of similar feature sets for all outer training sets.

iteration age embarked fare parch pclass sex sib_sp classif.ce features task_id learner_id resampling_id
1 TRUE TRUE TRUE FALSE TRUE TRUE TRUE 0.1599202 age , embarked, fare , pclass , sex , sib_sp titanic classif.ranger.fselector cv
2 TRUE TRUE FALSE FALSE TRUE TRUE TRUE 0.1497650 age , embarked, pclass , sex , sib_sp titanic classif.ranger.fselector cv
3 TRUE TRUE FALSE TRUE FALSE TRUE TRUE 0.1902293 age , embarked, parch , sex , sib_sp titanic classif.ranger.fselector cv

Next, we want to compare the predictive performances estimated on the outer resampling to the inner resampling. Significantly lower predictive performances on the outer resampling indicate that the models with the optimized feature sets overfit the data.

rr$score()
iteration task_id learner_id resampling_id classif.ce
1 titanic classif.ranger.fselector cv 0.1649832
2 titanic classif.ranger.fselector cv 0.2289562
3 titanic classif.ranger.fselector cv 0.1582492

The archives of the AutoFSelectors gives us all evaluated feature sets with the associated predictive performances.

iteration age embarked fare parch pclass sex sib_sp classif.ce runtime_learners timestamp batch_nr resample_result task_id learner_id resampling_id
1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE 0.4090585 1.327 2022-02-18 10:04:52 1 <environment: 0x555bb69d32b0> titanic classif.ranger.fselector cv
1 FALSE TRUE FALSE FALSE FALSE FALSE FALSE 0.3635949 1.017 2022-02-18 10:04:52 1 <environment: 0x555bbf4e9b70> titanic classif.ranger.fselector cv
1 FALSE FALSE TRUE FALSE FALSE FALSE FALSE 0.3316906 1.669 2022-02-18 10:04:52 1 <environment: 0x555bc1935b80> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE TRUE FALSE FALSE FALSE 0.3754024 1.020 2022-02-18 10:04:52 1 <environment: 0x555c6d27b550> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3114798 1.001 2022-02-18 10:04:52 1 <environment: 0x555c6cd06e00> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE FALSE FALSE TRUE FALSE 0.2254949 0.933 2022-02-18 10:04:52 1 <environment: 0x555c5d813500> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE FALSE FALSE FALSE TRUE 0.3720553 1.016 2022-02-18 10:04:52 1 <environment: 0x555c1ffa9e50> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE FALSE FALSE TRUE FALSE 0.2170916 0.849 2022-02-18 10:04:59 2 <environment: 0x555c5f395fc0> titanic classif.ranger.fselector cv
1 FALSE TRUE FALSE FALSE FALSE TRUE FALSE 0.2254949 0.896 2022-02-18 10:04:59 2 <environment: 0x555c6f8a2d10> titanic classif.ranger.fselector cv
1 FALSE FALSE TRUE FALSE FALSE TRUE FALSE 0.2254949 0.892 2022-02-18 10:04:59 2 <environment: 0x555c3f60ec90> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE TRUE FALSE TRUE FALSE 0.2238143 0.797 2022-02-18 10:04:59 2 <environment: 0x555c2006bf10> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE FALSE TRUE TRUE FALSE 0.2238428 0.834 2022-02-18 10:04:59 2 <environment: 0x555c59ab23c0> titanic classif.ranger.fselector cv
1 FALSE FALSE FALSE FALSE FALSE TRUE TRUE 0.2221478 0.843 2022-02-18 10:04:59 2 <environment: 0x555c88e75ef0> titanic classif.ranger.fselector cv
1 TRUE TRUE FALSE FALSE FALSE TRUE FALSE 0.2221336 0.901 2022-02-18 10:05:05 3 <environment: 0x555c5812db40> titanic classif.ranger.fselector cv
1 TRUE FALSE TRUE FALSE FALSE TRUE FALSE 0.2221336 0.960 2022-02-18 10:05:05 3 <environment: 0x555c4dbe20c0> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE TRUE FALSE TRUE FALSE 0.2137445 0.902 2022-02-18 10:05:05 3 <environment: 0x555c830be670> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE FALSE TRUE TRUE FALSE 0.2053269 0.900 2022-02-18 10:05:05 3 <environment: 0x555c903893d0> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE FALSE FALSE TRUE TRUE 0.2036320 0.909 2022-02-18 10:05:05 3 <environment: 0x555c4dc2d730> titanic classif.ranger.fselector cv
1 TRUE TRUE FALSE FALSE FALSE TRUE TRUE 0.2087025 1.105 2022-02-18 10:05:11 4 <environment: 0x555c4371afe0> titanic classif.ranger.fselector cv
1 TRUE FALSE TRUE FALSE FALSE TRUE TRUE 0.1986469 1.304 2022-02-18 10:05:11 4 <environment: 0x555c389b8650> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE TRUE FALSE TRUE TRUE 0.2019798 1.246 2022-02-18 10:05:11 4 <environment: 0x555c90804180> titanic classif.ranger.fselector cv
1 TRUE FALSE FALSE FALSE TRUE TRUE TRUE 0.2036604 1.074 2022-02-18 10:05:11 4 <environment: 0x555c436ed4c0> titanic classif.ranger.fselector cv
1 TRUE TRUE TRUE FALSE FALSE TRUE TRUE 0.1952713 1.408 2022-02-18 10:05:17 5 <environment: 0x555c84476810> titanic classif.ranger.fselector cv
1 TRUE FALSE TRUE TRUE FALSE TRUE TRUE 0.1919100 1.275 2022-02-18 10:05:17 5 <environment: 0x555c7489eea0> titanic classif.ranger.fselector cv
1 TRUE FALSE TRUE FALSE TRUE TRUE TRUE 0.1817690 1.273 2022-02-18 10:05:17 5 <environment: 0x555c766be7d0> titanic classif.ranger.fselector cv
1 TRUE TRUE TRUE FALSE TRUE TRUE TRUE 0.1599202 6.306 2022-02-18 10:05:25 6 <environment: 0x555c3ef03560> titanic classif.ranger.fselector cv
1 TRUE FALSE TRUE TRUE TRUE TRUE TRUE 0.1716565 1.220 2022-02-18 10:05:25 6 <environment: 0x555c3f0ccd00> titanic classif.ranger.fselector cv
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.1666287 1.165 2022-02-18 10:05:26 7 <environment: 0x555c3f128290> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE FALSE FALSE FALSE FALSE 0.3973508 1.390 2022-02-18 10:04:52 1 <environment: 0x555c026f5c30> titanic classif.ranger.fselector cv
2 FALSE TRUE FALSE FALSE FALSE FALSE FALSE 0.3366614 1.047 2022-02-18 10:04:52 1 <environment: 0x555c02886c30> titanic classif.ranger.fselector cv
2 FALSE FALSE TRUE FALSE FALSE FALSE FALSE 0.3147557 1.532 2022-02-18 10:04:52 1 <environment: 0x555c11ee7470> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE TRUE FALSE FALSE FALSE 0.3517875 1.114 2022-02-18 10:04:52 1 <environment: 0x555c120edb90> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.2963111 1.084 2022-02-18 10:04:52 1 <environment: 0x555c12204d50> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE FALSE TRUE FALSE 0.1834354 0.957 2022-02-18 10:04:52 1 <environment: 0x555c12381370> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE FALSE FALSE TRUE 0.3383848 1.011 2022-02-18 10:04:52 1 <environment: 0x555c1240af70> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE FALSE FALSE TRUE FALSE 0.1867968 0.878 2022-02-18 10:04:59 2 <environment: 0x555c1249af00> titanic classif.ranger.fselector cv
2 FALSE TRUE FALSE FALSE FALSE TRUE FALSE 0.1834354 0.793 2022-02-18 10:04:59 2 <environment: 0x555c12551320> titanic classif.ranger.fselector cv
2 FALSE FALSE TRUE FALSE FALSE TRUE FALSE 0.1851161 1.018 2022-02-18 10:04:59 2 <environment: 0x555c125d6e40> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE TRUE FALSE TRUE FALSE 0.1834354 0.810 2022-02-18 10:04:59 2 <environment: 0x555c126d5060> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE TRUE TRUE FALSE 0.1834354 0.801 2022-02-18 10:04:59 2 <environment: 0x555c127773a0> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE FALSE TRUE TRUE 0.1817690 0.800 2022-02-18 10:04:59 2 <environment: 0x555c458520e0> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE FALSE FALSE TRUE TRUE 0.1716565 0.901 2022-02-18 10:05:05 3 <environment: 0x555c458b57f0> titanic classif.ranger.fselector cv
2 FALSE TRUE FALSE FALSE FALSE TRUE TRUE 0.1851303 0.842 2022-02-18 10:05:05 3 <environment: 0x555c45a904d0> titanic classif.ranger.fselector cv
2 FALSE FALSE TRUE FALSE FALSE TRUE TRUE 0.1851303 0.922 2022-02-18 10:05:05 3 <environment: 0x555c45aed9b0> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE TRUE FALSE TRUE TRUE 0.1868110 0.879 2022-02-18 10:05:05 3 <environment: 0x555c45cdfbd0> titanic classif.ranger.fselector cv
2 FALSE FALSE FALSE FALSE TRUE TRUE TRUE 0.1817690 0.822 2022-02-18 10:05:05 3 <environment: 0x555c45d34f30> titanic classif.ranger.fselector cv
2 TRUE TRUE FALSE FALSE FALSE TRUE TRUE 0.1733371 1.032 2022-02-18 10:05:11 4 <environment: 0x555c45ec30f0> titanic classif.ranger.fselector cv
2 TRUE FALSE TRUE FALSE FALSE TRUE TRUE 0.1700328 1.240 2022-02-18 10:05:11 4 <environment: 0x555c45f6b1e0> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE TRUE FALSE TRUE TRUE 0.1733514 1.101 2022-02-18 10:05:11 4 <environment: 0x555c4614d140> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE FALSE TRUE TRUE TRUE 0.1548497 1.020 2022-02-18 10:05:11 4 <environment: 0x555c461e2ec0> titanic classif.ranger.fselector cv
2 TRUE TRUE FALSE FALSE TRUE TRUE TRUE 0.1497650 1.055 2022-02-18 10:05:15 5 <environment: 0x555c1b7ff2c0> titanic classif.ranger.fselector cv
2 TRUE FALSE TRUE FALSE TRUE TRUE TRUE 0.1616152 1.210 2022-02-18 10:05:15 5 <environment: 0x555c1b86ea40> titanic classif.ranger.fselector cv
2 TRUE FALSE FALSE TRUE TRUE TRUE TRUE 0.1649765 1.094 2022-02-18 10:05:15 5 <environment: 0x555c1bb31620> titanic classif.ranger.fselector cv
2 TRUE TRUE TRUE FALSE TRUE TRUE TRUE 0.1531833 1.235 2022-02-18 10:05:23 6 <environment: 0x555c1bb8cbb0> titanic classif.ranger.fselector cv
2 TRUE TRUE FALSE TRUE TRUE TRUE TRUE 0.1582111 1.123 2022-02-18 10:05:23 6 <environment: 0x555c1bcf7c70> titanic classif.ranger.fselector cv
2 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.1497792 1.128 2022-02-18 10:05:25 7 <environment: 0x555c1be22240> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE 0.4158809 1.390 2022-02-18 10:04:51 1 <environment: 0x555c3443a270> titanic classif.ranger.fselector cv
3 FALSE TRUE FALSE FALSE FALSE FALSE FALSE 0.3905569 1.017 2022-02-18 10:04:51 1 <environment: 0x555c3457ebe0> titanic classif.ranger.fselector cv
3 FALSE FALSE TRUE FALSE FALSE FALSE FALSE 0.3349665 1.683 2022-02-18 10:04:51 1 <environment: 0x555c346491a0> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE TRUE FALSE FALSE FALSE 0.3855291 0.991 2022-02-18 10:04:51 1 <environment: 0x555c34943490> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0.3517875 1.006 2022-02-18 10:04:51 1 <environment: 0x555c34a8c030> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE FALSE TRUE FALSE 0.2306082 0.960 2022-02-18 10:04:51 1 <environment: 0x555c34bb03e0> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE FALSE FALSE TRUE 0.4241988 1.004 2022-02-18 10:04:51 1 <environment: 0x555c34c037f0> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE FALSE FALSE TRUE FALSE 0.2221906 0.995 2022-02-18 10:04:59 2 <environment: 0x555c34cfe670> titanic classif.ranger.fselector cv
3 FALSE TRUE FALSE FALSE FALSE TRUE FALSE 0.2306082 0.810 2022-02-18 10:04:59 2 <environment: 0x555c34f942d0> titanic classif.ranger.fselector cv
3 FALSE FALSE TRUE FALSE FALSE TRUE FALSE 0.2306082 0.894 2022-02-18 10:04:59 2 <environment: 0x555c39e7e970> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE TRUE FALSE TRUE FALSE 0.2322888 0.800 2022-02-18 10:04:59 2 <environment: 0x555c3a1a1270> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE TRUE TRUE FALSE 0.2356929 0.808 2022-02-18 10:04:59 2 <environment: 0x555c3a3abb50> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE FALSE TRUE TRUE 0.2188292 0.839 2022-02-18 10:04:59 2 <environment: 0x555c3a5c4000> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE FALSE FALSE TRUE TRUE 0.2087452 0.869 2022-02-18 10:05:05 3 <environment: 0x555c3a666020> titanic classif.ranger.fselector cv
3 FALSE TRUE FALSE FALSE FALSE TRUE TRUE 0.2154679 0.870 2022-02-18 10:05:05 3 <environment: 0x555c3a76d440> titanic classif.ranger.fselector cv
3 FALSE FALSE TRUE FALSE FALSE TRUE TRUE 0.2171628 0.886 2022-02-18 10:05:05 3 <environment: 0x555c3a89e8e0> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE TRUE FALSE TRUE TRUE 0.2171486 0.847 2022-02-18 10:05:05 3 <environment: 0x555c3a8f7d00> titanic classif.ranger.fselector cv
3 FALSE FALSE FALSE FALSE TRUE TRUE TRUE 0.2154821 0.906 2022-02-18 10:05:05 3 <environment: 0x555c2d85c850> titanic classif.ranger.fselector cv
3 TRUE TRUE FALSE FALSE FALSE TRUE TRUE 0.2020225 1.099 2022-02-18 10:05:11 4 <environment: 0x555c2d90ad40> titanic classif.ranger.fselector cv
3 TRUE FALSE TRUE FALSE FALSE TRUE TRUE 0.2221906 1.360 2022-02-18 10:05:11 4 <environment: 0x555c2daf87b0> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE TRUE FALSE TRUE TRUE 0.1919100 1.034 2022-02-18 10:05:11 4 <environment: 0x555c7abc39f0> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE FALSE TRUE TRUE TRUE 0.1986469 1.087 2022-02-18 10:05:11 4 <environment: 0x555c83f07540> titanic classif.ranger.fselector cv
3 TRUE TRUE FALSE TRUE FALSE TRUE TRUE 0.1902293 1.162 2022-02-18 10:05:15 5 <environment: 0x555bc214b840> titanic classif.ranger.fselector cv
3 TRUE FALSE TRUE TRUE FALSE TRUE TRUE 0.2188720 1.289 2022-02-18 10:05:15 5 <environment: 0x555bcaa244c0> titanic classif.ranger.fselector cv
3 TRUE FALSE FALSE TRUE TRUE TRUE TRUE 0.1936334 1.087 2022-02-18 10:05:15 5 <environment: 0x555bc5808890> titanic classif.ranger.fselector cv
3 TRUE TRUE TRUE TRUE FALSE TRUE TRUE 0.2239140 5.974 2022-02-18 10:05:23 6 <environment: 0x555bd056ad70> titanic classif.ranger.fselector cv
3 TRUE TRUE FALSE TRUE TRUE TRUE TRUE 0.1986754 1.187 2022-02-18 10:05:23 6 <environment: 0x555c57f9a1b0> titanic classif.ranger.fselector cv
3 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 0.2053981 1.190 2022-02-18 10:05:25 7 <environment: 0x555c76ef0d70> titanic classif.ranger.fselector cv

Shortcuts

Selecting a feature subset can be shortened by using the fselect()-shortcut.

instance = fselect(
  method = "random_search",
  task = tsk("iris"),
  learner = lrn("classif.log_reg"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)

Applying nested resampling can be shortened by using the fselect_nested()-shortcut.

rr = fselect_nested(
  method = "random_search",
  task = tsk("iris"),
  learner = lrn("classif.log_reg"),
  inner_resampling = rsmp ("cv", folds = 3),
  outer_resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  term_evals = 10
)

Citation

For attribution, please cite this work as

Becker (2021, Jan. 8). mlr-org: Feature Selection on the Titanic Data Set. Retrieved from https://mlr-org.github.io/mlr-org-website/gallery/2020-09-14-mlr3fselect-basic/

BibTeX citation

@misc{becker2021feature,
  author = {Becker, Marc},
  title = {mlr-org: Feature Selection on the Titanic Data Set},
  url = {https://mlr-org.github.io/mlr-org-website/gallery/2020-09-14-mlr3fselect-basic/},
  year = {2021}
}