Exercise Collection for Practice and Learning
-
Greedy Ensemble Selection and Stacking
Implement greedy ensemble selection and stacking on german credit set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Filter
Use filters in a mlr3 pipeline
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Nested Resampling
Estimate the generalization error of a k-NN model on german credit set via nested resampling.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Parallelization
Set up a large scale benchmark experiment with parallelization
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Tuning
Optimize hyperparameters for k-NN and SVM classifier on german credit set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Impact of Encoding
Construct pipelines for benchmark experiments on kc_housing set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Advanced Resampling with Custom Measure Solution
Use stratified resampling to evaluate the german credit set and blocking for BreastCancer set. Define custom measures in mlr3 and use them to evaluate a model on the mtcars task.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Filters
Use pipelines for efficient pre-processing and model training on a the kc_housing task.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Feature Selection
Select features from the german credit set and evaluate model performance.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Benchmarking Solution
Hyperparameter tuning and benchmarking on german credit task.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Encoding and Scaling
Create a pipeline to do feature preprocessing (one-hot-encoding, Yeo-Johnson transformation) for the german credit task.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Resampling Solution
Use 5-fold cross validation to evaluate logistic regression and knn learner on german credit set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Train Predict Evaluate Basics Solution
Introduction to German Credit dataset and classification. Train predict and evaluate a logistic regression learner with hold-out split.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Calibration with probably
Learn the basics of `tidymodels` for supervised learning, assess if a model is well-calibrated, and calibrate it with `probably`.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Model Averaging
Do ensembling and model averaging on german credit set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Deep dive into Bayesian Optimization
Use Bayesian optimization (BO) using `bbotk` and `mlr3mbo` for general black box optimization problems, and more specifically, hyperparameter optimization (HPO).
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Tree Methods Solution
Use, plot and benchmark classification tree and random forest on german credit set.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Benchmarking Hypothesis
Benchmark models in multiple scenarios, using hypothesis tests as an additional diagnostic tool to make the benchmark more rigorous.
- Giuseppe Casalicchio, Essential Data Science Training GmbH
-
Xgboost
Optimize hyperparameters of xgboost for german credit task.
- Giuseppe Casalicchio, Essential Data Science Training GmbH