Skip to Content

mlr + drake: Reproducible machine-learning workflow management

You may have heard about the drake package. It got a lot attention recently in the R community because it simplifies reproducible workflow management. This comes especially handy for large projects which have hundreds of intermediate steps. Built-in High-Performance-Cluster (HPC) support and graph visualization are just two goodies that come on top of the basic functionality.

drake is able to track changes in your intermediate targets. This means once you change something in your early workflow pipeline, drake will automatically update all follow-up objects that might be affected by this change. The following tweet wraps the struggle of keeping track of dependencies in a research project in an simple picture:

The maintainer of drake, Will Landau (@wlandau) is extremely responsive and has also written one of the most extensive and detailed manuals that exist in the R package jungle.

If you have installed drake, you can start right away with one of the built-in examples.

drake::drake_example("mlr-slurm")

At the time of writing, there are 17(!) examples that you can choose from. One of the newest shows how to use mlr with drake on a HPC.

Machine-Learning projects/tasks interact especially well with the drake idea since you can easily create large comparison matrices using different algorithms / hyperparameter settings. At the same time drake can sent these settings in parallel to a HPC for you, simplifying your modeling tasks a lot.