mlr 2.10

Summary of the changes in the new mlr 2.10 version


Janek Thomas


February 13, 2017

mlr 2.10 is now on CRAN. Please update your package if you haven’t done so in a while.

Here is an overview of the changes:

functions - general

  • fixed bug in resample when using predict = “train” (issue #1284)
  • update to irace 2.0 – there are algorithmic changes in irace that may affect performance
  • generateFilterValuesData: fixed a bug wrt feature ordering
  • imputeLearner: fixed a bug when data actually contained no NAs
  • print.Learner: if a learner hyperpar was set to value “NA” this was not displayed in printer
  • makeLearner, setHyperPars: if you mistype a learner or hyperpar name, mlr uses fuzzy matching to suggest the 3 closest names in the message
  • tuneParams: tuning with irace is now also parallelized, i.e., different learner configs are evaluated in parallel.
  • benchmark: mini fix, arg ‘learners’ now also accepts class strings
  • object printers: some mlr printers show head previews of data.frames. these now also print info on the total nr of rows and cols and are less confusing
  • aggregations: have better properties now, they know whether they require training or test set evals
  • the filter methods have better R docs
  • filter new arg “method”
  • filter mrmr: fixed some smaller bugs and updated properties
  • generateLearningCurveData: also accepts single learner, does not require a list
  • plotThreshVsPerf: added “measures” arg
  • plotPartialDependence: can create tile plots with joint partial dependence on two features for multiclass classification by facetting across the classes
  • generatePartialDependenceData and generateFunctionalANOVAData: expanded “fun” argument to allow for calculation of weights
  • new “?mlrFamilies” manual page which lists all families and the functions belonging to it
  • we are converging on data.table as a standard internally, this should not change any API behavior on the outside, though
  • generateHyperParsEffectData and plotHyperParsEffect now support more than 2 hyperparameters
  • linear.correlation, rank.correlation, anova.test: use Rfast instead of FSelector/custom implementation now, performance should be much better
  • use of our own colAUC function instead of the ROCR package for AUC calculation to improve performance
  • we output resample performance messages for every iteration now
  • performance improvements for the auc measure
  • createDummyFeatures supports vectors now
  • removed the pretty.names argument from plotHyperParsEffect – labels can be set though normal ggplot2 functions on the returned object
  • Fixed a bad bug in resample, the slot “runtime” or a ResampleResult, when the runtime was measured not in seconds but e.g. mins. R measures then potentially in mins, but mlr claimed it would be seconds.
  • New “dummy” learners (that disregard features completely) can be fitted now for baseline comparisons, see “featureless” learners below.

functions - new

  • filter: randomForest.importance
  • generateFeatureImportanceData: permutation-based feature importance and local importance
  • getFeatureImportanceLearner: new Learner API function
  • getFeatureImportance: top level function to extract feature importance information
  • calculateROCMeasures
  • calculateConfusionMatrix: new confusion-matrix like function that calculates and tables many receiver operator measures
  • makeLearners: create multiple learners at once
  • getLearnerId, getLearnerType, getLearnerPredictType, getLearnerPackages
  • getLearnerParamSet, getLearnerParVals
  • getRRPredictionList
  • addRRMeasure
  • plotResiduals
  • getLearnerShortName
  • mergeBenchmarkResults

functions - renamed

  • Renamed rf.importance filter (now deprecated) to randomForestSRC.var.rfsrc
  • Renamed rf.min.depth filter (now deprecated) to
  • Renamed getConfMatrix (now deprecated) to calculateConfusionMatrix
  • Renamed setId (now deprecated) to setLearnerId

functions - removed

  • mergeBenchmarkResultLearner, mergeBenchmarkResultTask

learners - general

  • classif.ada: fixed some param problem with rpart.control params
  • classif.cforest, regr.cforest, surv.cforest: removed parameters “minprob”, “pvalue”, “randomsplits” as these are set internally and cannot be changed by the user
  • regr.GPfit: some more params for correlation kernel
  • classif.xgboost, regr.xgboost: can now properly handle NAs (property was missing and other problems), added “colsample_bylevel” parameter
  • adapted {classif,regr,surv}.ranger parameters for new ranger version

learners - new

  • multilabel.cforest
  • surv.gbm
  • regr.cvglmnet
  • {classif,regr,surv}.gamboost
  • {classif,regr}.evtree
  • {classif,regr}.evtree

learners - removed

  • classif.randomForestSRCSyn, regr.randomForestSRCSyn: due to continued stability issues

measures - new

  • ssr, qsr, lsr
  • rrse, rae, mape
  • kappa, wkappa
  • msle, rmsle