8 MBL Ensemble Approach

To make more robust predictions with measures of uncertainty, we use an ensemble approach with MBL. We run 20 mbl models, with different modeling parameters, so each samples has a distribution of predictions. We can then calculate the median prediction, as well as upper and lower confidence bounds.

Model Iterations

Model runs vary on 3 variables…

How similarity is determined between samples 5 options

euclid
cosine
cor
pc
pls

Whether the similarity matrix is used as a predictor variable or not 2 options

predictors
none

What modeling method is used for making local predictions 2 options

pls
wapls

5 x 2 x 2 = 20 model combinations

Table 8.1: Model Combinations for Ensemble Approach
Similarity_Metric	Usage_Similarity_Matrix	Local_Model
euclid	none	wapls1
cosine	none	wapls1
cor	none	wapls1
pc	none	wapls1
pls	none	wapls1
euclid	predictors	wapls1
cosine	predictors	wapls1
cor	predictors	wapls1
pc	predictors	wapls1
pls	predictors	wapls1
euclid	none	pls
cosine	none	pls
cor	none	pls
pc	none	pls
pls	none	pls
euclid	predictors	pls
cosine	predictors	pls
cor	predictors	pls
pc	predictors	pls
pls	predictors	pls

Getting started

The best way to get started using this code, is by downloading the Soil-Predictions-Ensemble-Example folder found here:
Soil-Predictions-Ensemble-Example Folder

This folder, along with all source code for this guide, can be found in the following Github Repository:
whrc/Soil-Predictions-MIR

File Walkthrough

setname_prep.R

Performs the calibration transfer on the spectra and saves as RData file in ‘spc’ folder Change the input csv file, the columns being selected as spectra (lines 11-12), and output name/location

setname_oc.R

Submit as a job through cloudops, creates all the mbl models with different parameter combinations, to output/oc folder Change input validation and calibration sets (line 32-38), property (oc) throughout the file, output location (line 107) and create output folder for soil property

setname-fratio.R

Calculates the fratio for all samples in the calibration and validation sets and outputs a list of outlier indices from the combined dataset. Ex: calibration set indices are 1-15000, validation set indices are from 15001-15240 Change input calibration and validation spectra (5-6 and throughout), number of directories in line 8 as needed, output location- currently ‘fratio’ subfolder.

setname-extract.R

Creates comprehensive files containing all predictions for each mbl model by property. (ie. pred.oc.csv, pred.bd.csv) Creates a file containing the lower, mean and upper prediction estimates for each property across all models (all-predictions.csv)

Note: The calibration set spc.oc.RData, and the transfer matrix pls.moving.w2k.RData- called in the code- were both too large to be hosted in this repository