The website features runtime and memory benchmarks of the mlr3tuning package now.

Feature Selection Filter

Feature Filters quantify the importance of each feature of a Task by assigning them a numerical score. In a second step, features can be selected by either selecting a fixed absolute or relative frequency of the best features, or by thresholding on the score value.

The Filter PipeOp allows to use filters as a preprocessing step.

Key
Label
Task Types
Packages
ANOVA F-Test
  • classif
  • stats
Area Under the ROC Curve Score
  • classif
Burota
  • regr
  • classif
Correlation-Adjusted coRrelation Score
  • regr
Correlation-Adjusted coRrelation Survival Score
  • surv
Minimal Conditional Mutual Information Maximization
  • classif
  • regr
Correlation
  • regr
  • stats
Double Input Symmetrical Relevance
  • classif
  • regr
Correlation-based Score
  • NA
  • stats
Importance Score
  • classif
Information Gain
  • classif
  • regr
Joint Mutual Information
  • classif
  • regr
Minimal Joint Mutual Information Maximization
  • classif
  • regr
Kruskal-Wallis Test
  • classif
  • stats
Mutual Information Maximization
  • classif
  • regr
Minimum Redundancy Maximal Relevancy
  • classif
  • regr
Minimal Normalised Joint Mutual Information Maximization
  • classif
  • regr
Predictive Performance
  • classif
Permutation Score
  • classif
RELIEF
  • classif
  • regr
Embedded Feature Selection
  • classif
Univariate Cox Survival Score
  • surv
Variance
  • NA
  • stats

Example Usage

Use the log10()-transformed p-values of a Kruskal-Wallis rank sum test (implemented in kruskal.test()) for filtering features of the Pima Indian Diabetes tasks.

library("mlr3verse")
Loading required package: mlr3
# retrieve a task
task = tsk("pima")

# retrieve a filter
filter = flt("kruskal_test")

# calculate scores
filter$calculate(task)

# access scores
filter$scores
  glucose       age      mass   insulin   triceps  pregnant  pedigree  pressure 
39.885381 16.942901 16.740864 13.127828  9.158113  7.426955  5.922431  5.788607 
# plot scores
autoplot(filter)

# subset task to 3 most important features
task$select(head(names(filter$scores), 3))
task$feature_names
[1] "age"     "glucose" "mass"