Feature Selection Filter

Feature Selection Filters.

Feature Filters quantify the importance of each feature of a Task by assigning them a numerical score. In a second step, features can be selected by either selecting a fixed absolute or relative frequency of the best features, or by thresholding on the score value.

The Filter PipeOp allows to use filters as a preprocessing step.

Example Usage

Use the $$-\log_10()$$-transformed $$p$$-values of a Kruskal-Wallis rank sum test (implemented in kruskal.test()) for filtering features of the Pima Indian Diabetes tasks.

library("mlr3verse")

# retrieve a task
task = tsk("pima")

# retrieve a filter
filter = flt("kruskal_test")

# calculate scores
filter$calculate(task) # access scores filter$scores

  glucose       age      mass   insulin   triceps  pregnant  pedigree
39.885381 16.942901 16.740864 13.127828  9.158113  7.426955  5.922431
pressure
5.788607 
# plot scores
autoplot(filter)

# subset task to 3 most important features
task$select(head(names(filter$scores), 3))
task\$feature_names

[1] "age"     "glucose" "mass"