8 MBL Ensemble Approach

To make more robust predictions with measures of uncertainty, we use an ensemble approach with MBL. We run 20 mbl models, with different modeling parameters, so each samples has a distribution of predictions. We can then calculate the median prediction, as well as upper and lower confidence bounds.

Model Iterations

Model runs vary on 3 variables…

  1. How similarity is determined between samples 5 options
  • euclid
  • cosine
  • cor
  • pc
  • pls
  1. Whether the similarity matrix is used as a predictor variable or not 2 options
  • predictors
  • none
  1. What modeling method is used for making local predictions 2 options
  • pls
  • wapls

5 x 2 x 2 = 20 model combinations

Table 8.1: Model Combinations for Ensemble Approach
Similarity_Metric Usage_Similarity_Matrix Local_Model
euclid none wapls1
cosine none wapls1
cor none wapls1
pc none wapls1
pls none wapls1
euclid predictors wapls1
cosine predictors wapls1
cor predictors wapls1
pc predictors wapls1
pls predictors wapls1
euclid none pls
cosine none pls
cor none pls
pc none pls
pls none pls
euclid predictors pls
cosine predictors pls
cor predictors pls
pc predictors pls
pls predictors pls

Getting started

The best way to get started using this code, is by downloading the Soil-Predictions-Ensemble-Example folder found here:
Soil-Predictions-Ensemble-Example Folder

This folder, along with all source code for this guide, can be found in the following Github Repository:
whrc/Soil-Predictions-MIR

File Walkthrough

setname_prep.R

Performs the calibration transfer on the spectra and saves as RData file in ‘spc’ folder Change the input csv file, the columns being selected as spectra (lines 11-12), and output name/location

setname_oc.R

Submit as a job through cloudops, creates all the mbl models with different parameter combinations, to output/oc folder Change input validation and calibration sets (line 32-38), property (oc) throughout the file, output location (line 107) and create output folder for soil property

setname-fratio.R

Calculates the fratio for all samples in the calibration and validation sets and outputs a list of outlier indices from the combined dataset. Ex: calibration set indices are 1-15000, validation set indices are from 15001-15240 Change input calibration and validation spectra (5-6 and throughout), number of directories in line 8 as needed, output location- currently ‘fratio’ subfolder.

setname-extract.R

Creates comprehensive files containing all predictions for each mbl model by property. (ie. pred.oc.csv, pred.bd.csv) Creates a file containing the lower, mean and upper prediction estimates for each property across all models (all-predictions.csv)

Note: The calibration set spc.oc.RData, and the transfer matrix pls.moving.w2k.RData- called in the code- were both too large to be hosted in this repository