mlr loves OpenML

R r-bloggers

Connecting mlr with OpenML

Heidi Seibold https://github.com/HeidiSeibold
2016-09-09

OpenML stands for Open Machine Learning and is an online platform, which aims at supporting collaborative machine learning online. It is an Open Science project that allows its users to share data, code and machine learning experiments.

At the time of writing this blog I am in Eindoven at an OpenML workshop, where developers and scientists meet to work on improving the project. Some of these people are R users and they (we) are developing an R package that communicates with the OpenML platform.

OpenML in R

The OpenML R package can list and download data sets and machine learning tasks (prediction challenges). In R one can run algorithms on the these data sets/tasks and then upload the results to OpenML. After successful uploading, the website shows how well the algorithm performs. To run the algorithm on a given task the OpenML R package builds on the mlr package. mlr understands what a task is and can run learners on that task. So all the OpenML package needs to do is convert the OpenML objects to objects mlr understands and then mlr deals with the learning.

A small case study

We want to create a little study on the OpenML website, in which we compare different types of Support Vector Machines. The study gets an ID assigned to it, which in our case is 27. We use the function ksvm (with different settings of the function argument type) from package kernlab, which is integrated in mlr (“classif.ksvm”).

For details on installing and setting up the OpenML R package please see the guide on GitHub.

Let’s start conducting the study:

dsize = c(100, 500)
taskinfo_all = listOMLTasks(number.of.instances = dsize)
taskinfo_10cv = subset(taskinfo_all, task.type == "Supervised Classification" & 
                    estimation.procedure == "10-fold Crossvalidation" &
                    evaluation.measures == "predictive_accuracy" &
                    number.of.missing.values == 0 &
                    number.of.classes %in% c(2, 4))
taskinfo = taskinfo_10cv[1:3, ]
lrn.list = list(
  makeLearner("classif.ksvm", type = "C-svc"),
  makeLearner("classif.ksvm", type = "kbb-svc"),
  makeLearner("classif.ksvm", type = "spoc-svc")
)
grid = expand.grid(task.id = taskinfo$task.id, 
                   lrn.ind = seq_along(lrn.list))

runs = lapply(seq_row(grid), function(i) {
  message(i)
  task = getOMLTask(grid$task.id[i])
  ind = grid$lrn.ind[i]
  runTaskMlr(task, lrn.list[[ind]])
})
## please do not spam the OpenML server by uploading these
## tasks. I already did that.
run.id = lapply(runs, uploadOMLRun, tags = "study_27")

To show the results of our study, list the run evaluations and make a nice plot.

evals = listOMLRunEvaluations(tag = "study_27")

evals$task.id = as.factor(evals$task.id)
evals$setup.id = as.factor(evals$setup.id)

library("ggplot2")
ggplot(evals, aes(x = setup.id, y = predictive.accuracy, 
                  color = data.name, group = task.id)) + 
  geom_point() + geom_line()

Now you can go ahead and create a bigger study using the techniques you have learned.

Further infos

If you are interested in more, check out the OpenML blog, the paper and the GitHub repos.

Citation

For attribution, please cite this work as

Seibold (2016, Sept. 9). mlr-org: mlr loves OpenML. Retrieved from https://mlr-org.github.io/mlr-org-website/posts/2016-09-09-mlr-loves-openml/

BibTeX citation

@misc{seibold2016mlr,
  author = {Seibold, Heidi},
  title = {mlr-org: mlr loves OpenML},
  url = {https://mlr-org.github.io/mlr-org-website/posts/2016-09-09-mlr-loves-openml/},
  year = {2016}
}