Feature Selection Filters.
Feature Filters
quantify the importance of each feature of a Task
by assigning them a numerical score. In a second step, features can be selected by either selecting a fixed absolute or relative frequency of the best features, or by thresholding on the score value.
The Filter PipeOp
allows to use filters as a preprocessing step.
Use the \(-\log_10()\)-transformed \(p\)-values of a Kruskal-Wallis rank sum test (implemented in kruskal.test()
) for filtering features of the Pima Indian Diabetes
tasks.
library("mlr3verse")
# retrieve a task
task = tsk("pima")
# retrieve a filter
filter = flt("kruskal_test")
# calculate scores
filter$calculate(task)
# access scores
filter$scores
glucose age mass insulin triceps pregnant pedigree
39.885381 16.942901 16.740864 13.127828 9.158113 7.426955 5.922431
pressure
5.788607
# plot scores
autoplot(filter)
# subset task to 3 most important features
task$select(head(names(filter$scores), 3))
task$feature_names
[1] "age" "glucose" "mass"