Scope
Working with spatial data in R requires a lot of data wrangling e.g. reading from different file formats, converting between spatial formats, creating tables from point layers, and predicting spatial raster images. The goal of mlr3spatial is to simplify these workflows within the mlr3 ecosystem. As a practical example, we will perform a land cover classification for the city of Leipzig, Germany. Figure 1 illustrates the typical workflow for this type of task: Load the training data, create a spatial task, train a learner with it, and predict the final raster image.
We assume that you are familiar with the mlr3 ecosystem and know the basic concepts of remote sensing. If not, we recommend reading the mlr3book first. If you are interested in spatial resampling, check out the book chapter on spatial analysis.
Land Cover Classification
Land cover is the physical material or vegetation that covers the surface of the earth, including both natural and human-made features. Understanding land cover patterns and changes over time is critical for addressing global environmental challenges, such as climate change, land degradation, and loss of biodiversity. Land cover classification is the process of assigning land cover classes to pixels in a raster image. With mlr3spatial, we can easily perform a land cover classification within the mlr3 ecosystem.
Before we can start the land cover classification, we need to load the necessary packages. The mlr3spatial package relies on terra for processing raster data and sf for vector data. These widely used packages read all common raster and vector formats. Additionally, the stars and raster packages are supported.
library(mlr3verse)
library(mlr3spatial)
library(terra, exclude = "resample")
library(sf)
We will work with a Sentinel-2 scene of the city of Leipzig which consists of 7 bands with a 10 or 20m spatial resolution and an NDVI band. The data is included in the mlr3spatial package. We use the terra::rast()
to load the TIFF raster file.
= rast(system.file("extdata", "leipzig_raster.tif", package = "mlr3spatial"))
leipzig_raster leipzig_raster
class : SpatRaster
dimensions : 206, 154, 8 (nrow, ncol, nlyr)
resolution : 10, 10 (x, y)
extent : 731810, 733350, 5692030, 5694090 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 32N (EPSG:32632)
source : leipzig_raster.tif
names : b02, b03, b04, b06, b07, b08, ...
min values : 846, 645, 366, 375, 401, 374, ...
max values : 4705, 4880, 5451, 4330, 5162, 5749, ...
The training data is a GeoPackage point layer with land cover labels and spectral features. We load the file and create a simple feature point layer
.
= read_sf(system.file("extdata", "leipzig_points.gpkg", package = "mlr3spatial"), stringsAsFactors = TRUE)
leipzig_vector leipzig_vector
Simple feature collection with 97 features and 9 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 731930.5 ymin: 5692136 xmax: 733220.3 ymax: 5693968
Projected CRS: WGS 84 / UTM zone 32N
# A tibble: 97 × 10
b02 b03 b04 b06 b07 b08 b11 ndvi land_cover geom
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <POINT [m]>
1 903 772 426 2998 4240 4029 1816 0.809 forest (732480.1 5693957)
2 1270 1256 1081 1998 2493 2957 2073 0.465 urban (732217.4 5692769)
3 1033 996 777 2117 2748 2799 1595 0.565 urban (732737.2 5692469)
4 962 773 500 465 505 396 153 -0.116 water (733169.3 5692777)
5 1576 1527 1626 1715 1745 1768 1980 0.0418 urban (732202.2 5692644)
6 1125 1185 920 3058 3818 3758 2682 0.607 pasture (732153 5693059)
7 880 746 424 2502 3500 3397 1469 0.778 forest (731937.9 5693722)
8 1332 1251 1385 1663 1799 1640 1910 0.0843 urban (732416.2 5692324)
9 940 741 475 452 515 400 139 -0.0857 water (732933.7 5693344)
10 902 802 454 2764 3821 3666 1567 0.780 forest (732411.3 5693352)
# ℹ 87 more rows
We plot both layers to get an overview of the data. The training points are located in the districts of Lindenau and Zentrum West.
Code
library(ggplot2)
library(tidyterra, exclude = "filter")
ggplot() +
geom_spatraster_rgb(data = leipzig_raster, r = 3, g = 2, b = 1, max_col_value = 5451) +
geom_spatvector(data = leipzig_vector, aes(color = land_cover)) +
scale_color_viridis_d(name = "Land cover", labels = c("Forest", "Pastures", "Urban", "Water")) +
theme_minimal()
The as_task_classif_st()
function directly creates a spatial task from the point layer. This makes it unnecessary to transform the point layer to a data.frame
with coordinates. Spatial tasks additionally store the coordinates of the training points. The coordinates are useful when estimating the performance of the model with spatial resampling.
= as_task_classif_st(leipzig_vector, target = "land_cover")
task task
<TaskClassifST:leipzig_vector> (97 x 9)
* Target: land_cover
* Properties: multiclass
* Features (8):
- dbl (8): b02, b03, b04, b06, b07, b08, b11, ndvi
* Coordinates:
X Y
1: 732480.1 5693957
2: 732217.4 5692769
3: 732737.2 5692469
4: 733169.3 5692777
5: 732202.2 5692644
---
93: 733018.7 5692342
94: 732551.4 5692887
95: 732520.4 5692589
96: 732542.2 5692204
97: 732437.8 5692300
Now we can train a model with the task. We use a simple decision tree learner from the rpart package. The "classif_st"
task is a specialization of the "classif"
task and therefore works with all "classif"
learners.
= lrn("classif.rpart")
learner $train(task) learner
To get a complete land cover classification of Leipzig, we have to predict on each pixel and return a raster image with these predictions. The $predict()
method of the learner only works for tabular data. To predict a raster image, we use the predict_spatial()
function.
# predict land cover map
= predict_spatial(leipzig_raster, learner) land_cover
Code
ggplot() +
geom_spatraster(data = land_cover) +
scale_fill_viridis_d(name = "Land cover", labels = c("Forest", "Pastures", "Urban", "Water")) +
theme_minimal()
Conclusion
Working with spatial data in R is very easy with the mlr3spatial package. You can quickly train a model with a point layer and predict a raster image. The mlr3spatial package is still in development and we are looking forward to your feedback and contributions.