```
library(mlr3verse)
library(mlr3extralearners)
# install.packages("mlr3extralearners", repos = "https://mlr-org.r-universe.dev")
<- readRDS("bike.rds")
bike <- as_task_regr(bike, target = "bikers")
biketask <- partition(biketask)
splits
<- auto_tuner(
tuned_rpf learner = lrn("regr.rpf", ntrees = 200, max_interaction = 4, nthreads = 2),
tuner = tnr("mbo"),
resampling = rsmp("cv", folds = 3),
terminator = trm("evals", n_evals = 100, k = 10),
search_space = ps(
max_interaction = p_int(2, 10),
splits = p_int(10, 100),
split_try = p_int(1, 20),
t_try = p_dbl(0.1, 1)
),store_tuning_instance = TRUE,
store_benchmark_result = TRUE
)
$train(biketask, row_ids = splits$train) tuned_rpf
```

When we talk about machine learning interpretability methods, we tend to circle back to similar data examples. Partially because there’s a benefit to the familiarity and partially because, well, there’s just a limited number of real-world datasets floating around out there which are both publicly accessible and exhibit some kind of interesting structure that justifies the investigation of, say, third-order interaction effects with some sort of intuitive interpretation.

For IML, the `Bikeshare`

data is one of those popular datasets. We’re using it for a showcase article of the `glex`

R package, and this post is decidedly not about that — but feel free to read the paper (Hiabu, Meyer, and Wright 2023).

What this post is actually about is Random Planted Forests (Hiabu, Mammen, and Meyer 2023, R package on GitHub). I wanted the usage example on the `Bikeshare`

data to be interesting and useful, and since interpretability methods tend to only be as good as the models they’re trying to explain, I first needed a decent model.

So, that’s what this post is about: Tuning rpf, and then using rpf to explain the tuning results.

## The Data

We’re using the `Bikeshare`

data as included with the `ISLR2`

package (originally from the UCI Machine Learning Repository), which I preprocessed and whittled down a little for simplicity’s sake. You can look at the preprocessing steps in the code below, but I’ll skip the dataset exploration as it’s not the focus of this post.

## Show preprocessing code

```
library(data.table)
if (!("ISLR2" %in% installed.packages())) {
install.packages("ISLR2")
}
data("Bikeshare", package = "ISLR2")
<- data.table(Bikeshare)
bike := as.numeric(as.character(hr))]
bike[, hr := factor(workingday, levels = c(0, 1), labels = c("No Workingday", "Workingday"))]
bike[, workingday := factor(season, levels = 1:4, labels = c("Winter", "Spring", "Summer", "Fall"))]
bike[, season := NULL]
bike[, atemp := NULL]
bike[, day := NULL]
bike[, registered := NULL]
bike[, casual
saveRDS(bike, "bike.rds")
```

## Tuning with `mlr3`

I wrapped the rpf learner into an `mlr3`

learner for `mlr3extralearners`

, which makes it very convenient to tune. If you’re unfamiliar with the `mlr3`

ecosystem, well guess who contributed to the now-published mlr3 book available for free online, which you can also buy from your friendly neighborhood dystopian online retailer in fancy tree corpse form.

No judgement.

Mostly.

It’s fine.

Anyway, here’s the code I used, which is fairly standard “wrap learner in `AutoTuner`

and tune the thing with 3-fold cross-validation and a somewhat arbitrary tuning budget using MBO I guess because why not” (WLATT3FCVSATBUMBOIGBWN, as my grampa used to call it).

Ideally, we would evaluate the tuned rpf on the test dataset like this:

```
<- tuned_rpf$predict(biketask, row_ids = splits$test)
pred $score(msr("regr.rmsle")) pred
```

…but since I saved the learner with `saveRDS()`

on a different machine and restored it here for use with this post, we only get the error message

```
Error:
! external pointer is not valid
```

This is related to rpf using Rcpp modules under the hood, with the takeaway being that at the time of writing I don’t know if there’s a way to serialize and deserialize rpf models for situations like this. This is quite unfortunate, but for the time being we’ll just have to assume that the tuning result is somewhat reasonable. I should have just tuned on the full datasets, but alas, I guess this will have to do now.

Anyways, we extract the tuning archive stored in `$archive`

of the `AutoTuner`

object and convert the MSE we initially tuned with to the RMSE just to get a more manageable range of scores.

```
library(data.table)
<- tuned_rpf$archive$data
archive := sqrt(regr.mse)] archive[, regr.rmse
```

The kaggle challenge for this dataset (or a version of it, anyway) evaluates using the RMSLE, which would probably be more appropriate, come to think of it. Putting that one on the “oh well, next time” pile.

Let’s take a look at our scores in relation to our hyperparameter configurations first — one at a time, ignoring any interdependencies.

```
library(ggplot2)
melt(archive, id.vars = "regr.rmse",
measure.vars = c("splits", "split_try", "t_try", "max_interaction")) |>
ggplot(aes(x = value, y = regr.rmse)) +
facet_wrap(vars(variable), scales = "free_x") +
geom_point() +
labs(
title = "rpf / Bikeshare tuning archive",
subtitle = "Scores on inner resampling folds",
x = "Parameter Value", y = "RMSE"
)
```

The main thing to note here is that “more `splits`

more good”, while the picture for the other parameters isn’t as clear. Other parameters might interact, and overall it’s not obvious if a parameter is more important than another.

Would be nice if we could somehow functionally decompose the effects of these parameters up to arbitrary ord— oh wait that’s `glex`

, yes, let’s do the `glex`

thing.

## Explaining the Tuning Results

Both `glex`

and `randomPlantedForest`

can be installed via r-universe if you can’t be bothered to type `remotes::install_github("PlantedML/glex")`

and/or `remotes::install_github("PlantedML/randomPlantedForest")`

, which I usually can’t:

`install.packages(c("glex", "randomPlantedForest"), repos = "https://plantedml.r-universe.dev")`

```
library(randomPlantedForest)
library(glex)
```

We then fit another rpf with heuristically picked parameters on the tuning archive, using the RMSE as target and tuning parameters as features. Why not tune rpf “properly” here, you ask? Because I can’t decide whether I want to make this post a recursion joke or not. Also time is finite and I couldn’t be bothered.

```
<- rpf(
rpfit ~ splits + split_try + t_try + max_interaction,
regr.rmse data = archive,
ntrees = 100,
splits = 100,
split_try = 20,
t_try = 1,
max_interaction = 4,
nthreads = 3
)
<- glex(rpfit, tuned_rpf$archive$data) rpglex
```

### Variable Importance

Let’s take a first look at the variable importance scores, calculated as the mean absolute contribution to RMSE of each main- or interaction effect, respectively. The nice thing about this is that we can quantify the relevance of each tuning parameter while fully taking into account any interaction with other parameters, *and also* quantify the overall relevance of e.g. second-order interactions compared to main effects only.

```
<- glex_vi(rpglex)
rpvi
autoplot(rpvi)
```

`autoplot(rpvi, by_degree = TRUE)`

…which definitely could be interesting in some case that is apparently not this one! So yeah.

Turns out `splits`

has by far the largest effect, then we see `t_try`

and `max_interaction`

far behind, while `split_try`

actually turns out to be less influential than its interaction effects with `t_try`

and `max_interaction`

? Okay? Sure, why not. I guess it’s a good thing to see confirmation that, interactions of the 3rd or 4th degree are negligible, and the second-order interactions are not surprising. Also, it confirms that if you only pay attention to one parameter, it should be `splits`

— which is also not particularly surprising, as this parameter controls how long the algorithm runs, meaning that larger values will inevitably lead to better performance than small ones.

### Main Effects

Next, let’s see the parameters’ main effects, meaning the difference from the average predicted value (intercept) across the observed parameter values.

Note the varying y-axis scales — they’re kind of important here.

```
library(patchwork)
<- autoplot(rpglex, "splits")
p1 <- autoplot(rpglex, "t_try")
p2 <- autoplot(rpglex, "max_interaction")
p3 <- autoplot(rpglex, "split_try")
p4
+ p2) / (p3 + p4) (p1
```

So, in short: `splits`

wants to be large, `t_try`

wants to be close to 1, `max_interaction`

most likely also wants to be large up to some point, and `split_try`

is pulling a ¯\_(ツ)_/¯ on us.

Fair enough.

### Interaction Effects

We can also take a look at the two largest interaction effects, `splits:split_try`

and `splits:t_try`

, but to be quite honest I’m not sure what to make of these plots except for how they illustrate in which direction MBO has taken the tuning process (large values for `splits`

and `t_try`

).

```
autoplot(rpglex, c("splits", "split_try")) +
autoplot(rpglex, c("splits", "t_try"))
```

Finally, here’s the final parameter configuration that “won”, meaning these are the parameters I’ll be using to fit an rpf to the `Bikeshare`

data in a new version of the `glex`

vignette:

`$tuning_result[, c("splits", "split_try", "t_try", "max_interaction")] tuned_rpf`

```
splits split_try t_try max_interaction
<int> <int> <num> <int>
1: 100 6 0.9422986 6
```

`c(rmse = sqrt(tuned_rpf$tuning_result$regr.mse))`

```
rmse
35.79528
```

…And thanks to `glex`

, I guess I have a better intuition for these parameters now? Is that how it works? Let’s say it does.

I’m not entirely sure how much I want to trust these results or want to make generalizations based off of them (I don’t), but the underlying principle seems quite useful to me. rpf is still a fairly young method and gaining intuition for its parameters like this seems neat.

## Conclusion

The key takeaway for the `Bikeshare`

tuning is that:

`splits`

wants to be large.`t_try`

wants to be close to 1.- Setting
`max_interaction`

to 5 or greater is only going to make you wait for the result longer. `split_try`

is also a parameter that exists. Idunno maybe just wing it with that boi and be done with it.

Turns out it wasn’t particularly eye-opening to take a look at parameter interactions, but oh well. Better to have decomposed and not needed it than to never decompose at all. Or something.

## References

*Proceedings of the 26th International Conference on Artificial Intelligence and Statistics*, edited by Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, 206:7040–60. Proceedings of Machine Learning Research. PMLR. https://proceedings.mlr.press/v206/hiabu23a.html.