Spatial Data in the mlr3 Ecosystem

Run a land cover classification of the city of Leipzig.

Author
Published

February 27, 2023

Scope

Working with spatial data in R requires a lot of data wrangling e.g. reading from different file formats, converting between spatial formats, creating tables from point layers, and predicting spatial raster images. The goal of mlr3spatial is to simplify these workflows within the mlr3 ecosystem. As a practical example, we will perform a land cover classification for the city of Leipzig, Germany. Figure 1 illustrates the typical workflow for this type of task: Load the training data, create a spatial task, train a learner with it, and predict the final raster image.

%%{ init: { 'flowchart': { 'curve': 'bump' } } }%%

flowchart LR
    subgraph files[Files]
    vector[Vector]
    raster[Raster]
    end
    subgraph load[Load Data]
    sf
    terra
    end
    vector --> sf
    raster --> terra
    subgraph train_model[Train Model]
    task[Task]
    learner[Learner]
    end
    terra --> prediction_raster
    task --> learner
    sf --> task
    subgraph predict[Spatial Prediction]
    prediction_raster[Raster Image]
    end
    learner --> prediction_raster
Figure 1: Spatial prediction workflow in mlr3spatial.

We assume that you are familiar with the mlr3 ecosystem and know the basic concepts of remote sensing. If not, we recommend reading the mlr3book first. If you are interested in spatial resampling, check out the book chapter on spatial analysis.

Land Cover Classification

Land cover is the physical material or vegetation that covers the surface of the earth, including both natural and human-made features. Understanding land cover patterns and changes over time is critical for addressing global environmental challenges, such as climate change, land degradation, and loss of biodiversity. Land cover classification is the process of assigning land cover classes to pixels in a raster image. With mlr3spatial, we can easily perform a land cover classification within the mlr3 ecosystem.

Before we can start the land cover classification, we need to load the necessary packages. The mlr3spatial package relies on terra for processing raster data and sf for vector data. These widely used packages read all common raster and vector formats. Additionally, the stars and raster packages are supported.

library(mlr3verse)
library(mlr3spatial)
library(terra, exclude = "resample")
library(sf)

We will work with a Sentinel-2 scene of the city of Leipzig which consists of 7 bands with a 10 or 20m spatial resolution and an NDVI band. The data is included in the mlr3spatial package. We use the terra::rast() to load the TIFF raster file.

leipzig_raster = rast(system.file("extdata", "leipzig_raster.tif", package = "mlr3spatial"))
leipzig_raster
class       : SpatRaster 
dimensions  : 206, 154, 8  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 731810, 733350, 5692030, 5694090  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 32N (EPSG:32632) 
source      : leipzig_raster.tif 
names       :  b02,  b03,  b04,  b06,  b07,  b08, ... 
min values  :  846,  645,  366,  375,  401,  374, ... 
max values  : 4705, 4880, 5451, 4330, 5162, 5749, ... 

The training data is a GeoPackage point layer with land cover labels and spectral features. We load the file and create a simple feature point layer.

leipzig_vector = read_sf(system.file("extdata", "leipzig_points.gpkg", package = "mlr3spatial"), stringsAsFactors = TRUE)
leipzig_vector
Simple feature collection with 97 features and 9 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 731930.5 ymin: 5692136 xmax: 733220.3 ymax: 5693968
Projected CRS: WGS 84 / UTM zone 32N
# A tibble: 97 × 10
     b02   b03   b04   b06   b07   b08   b11    ndvi land_cover               geom
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <fct>             <POINT [m]>
 1   903   772   426  2998  4240  4029  1816  0.809  forest     (732480.1 5693957)
 2  1270  1256  1081  1998  2493  2957  2073  0.465  urban      (732217.4 5692769)
 3  1033   996   777  2117  2748  2799  1595  0.565  urban      (732737.2 5692469)
 4   962   773   500   465   505   396   153 -0.116  water      (733169.3 5692777)
 5  1576  1527  1626  1715  1745  1768  1980  0.0418 urban      (732202.2 5692644)
 6  1125  1185   920  3058  3818  3758  2682  0.607  pasture      (732153 5693059)
 7   880   746   424  2502  3500  3397  1469  0.778  forest     (731937.9 5693722)
 8  1332  1251  1385  1663  1799  1640  1910  0.0843 urban      (732416.2 5692324)
 9   940   741   475   452   515   400   139 -0.0857 water      (732933.7 5693344)
10   902   802   454  2764  3821  3666  1567  0.780  forest     (732411.3 5693352)
# ℹ 87 more rows

We plot both layers to get an overview of the data. The training points are located in the districts of Lindenau and Zentrum West.

Code
library(ggplot2)
library(tidyterra, exclude = "filter")

ggplot() +
  geom_spatraster_rgb(data = leipzig_raster, r = 3, g = 2, b = 1, max_col_value = 5451) +
  geom_spatvector(data = leipzig_vector, aes(color = land_cover)) +
  scale_color_viridis_d(name = "Land cover", labels = c("Forest", "Pastures", "Urban", "Water")) +
  theme_minimal()

The as_task_classif_st() function directly creates a spatial task from the point layer. This makes it unnecessary to transform the point layer to a data.frame with coordinates. Spatial tasks additionally store the coordinates of the training points. The coordinates are useful when estimating the performance of the model with spatial resampling.

task = as_task_classif_st(leipzig_vector, target = "land_cover")
task
<TaskClassifST:leipzig_vector> (97 x 9)
* Target: land_cover
* Properties: multiclass
* Features (8):
  - dbl (8): b02, b03, b04, b06, b07, b08, b11, ndvi
* Coordinates:
           X       Y
 1: 732480.1 5693957
 2: 732217.4 5692769
 3: 732737.2 5692469
 4: 733169.3 5692777
 5: 732202.2 5692644
---                 
93: 733018.7 5692342
94: 732551.4 5692887
95: 732520.4 5692589
96: 732542.2 5692204
97: 732437.8 5692300

Now we can train a model with the task. We use a simple decision tree learner from the rpart package. The "classif_st" task is a specialization of the "classif" task and therefore works with all "classif" learners.

learner = lrn("classif.rpart")
learner$train(task)

To get a complete land cover classification of Leipzig, we have to predict on each pixel and return a raster image with these predictions. The $predict() method of the learner only works for tabular data. To predict a raster image, we use the predict_spatial() function.

# predict land cover map
land_cover = predict_spatial(leipzig_raster, learner)
Code
ggplot() +
  geom_spatraster(data = land_cover) +
  scale_fill_viridis_d(name = "Land cover", labels = c("Forest", "Pastures", "Urban", "Water")) +
  theme_minimal()

Conclusion

Working with spatial data in R is very easy with the mlr3spatial package. You can quickly train a model with a point layer and predict a raster image. The mlr3spatial package is still in development and we are looking forward to your feedback and contributions.