7 Model Performance

The following section describes the function getModResults() which coordinates the steps after your pls or mbl model is created:

  1. Getting Predictions

  2. Calculating Uncertainty

  3. Generating Summary Statistics

  4. Displaying Plots

This allows us to assess the performance of the model. How close are the predictions to the observed lab data? If you do not have lab data to compare your predictions to, you should simply use the functions getPredPLS() and getPredMBL()

Get Results

The following function returns the results for either pls or mbl models, calling upon getPredPLS() and getPredMBL() functions, then generates summary statistics and a plot of the predictions against the observations.

getModResults()

  • getModResults()
    • PROP: string- The column name of the soil property of interest. Ex: “OC”
    • MODTYPE: string- “MBL” or “PLS”
    • MODNAME: string- The name of the model variable, if it is already loaded into the R environment. Use MODNAME or MODPATH
    • MODPATH: string- The path the the RData file containing your model, if the model is not already loaded. Use MODNAME or MODPATH
    • PREDNAME: string- The name of the prediction set variable, if it is already loaded into the R environment, that will be used to create the model. Use PREDNAME or PREDPATH
    • PREDPATH: string- The path the the RData file containing your prediction set, if the prediction set is not already loaded. PREDNAME or PREDPATH
    • SAVEPRED: boolean- Whether or not to save the predictions. If TRUE, predictions will be saved to the folder ‘Predictions’ using the function savePredictions(). Default is set to TRUE
    • MODPERF: boolean- Whether or not to generate and show the prediction performance statistics. If TRUE, these statistics will be generated by the getModStats() function, and saved in the folder ‘Predictions’ in the performance log.

Predictions

Predictions are extracting using either getPredPLS() or getPredMBL(). These are called within getModResults() before being saved. The following functions save predictions to a file unique to each prediction set. If this file already exists, it simply adds another column of predictions. If it does not, it will create the file to save predictions in from the original prediction set.

Statistics

After making predictions using either of the modeling methods, various summary statistics can help test the accuracy of those predictions. The u-deviation, as a measure of uncertainty, can help assess how much each prediction can be trusted.

getModStats()

The getModStats function returns the following statistics in a dataframe:

  • R2
  • R2 adjusted
  • Slope
  • Y-Intercept
  • RMSE
  • Bias
  • Standard deviation (of predictions)
  • Residual prediction deviation

The minimum input is the PREDOBS table containing a column ‘pred’, containing the predictions and a column ‘obs’, its corresponding lab data. The remaining parameters are characteristics about the models and prediction runs that are important to include if you are saving the statistics.

calcUDev()

We can calculate uncertainty for our predictions with the u-deviation which takes into account both differences in the spectra of the prediction and reference sets, as well as the prediction performance measured against observed values. The equation for the u-deviation is shown below and explained within The Unscrambler Method References

  • ResXValSamp: The residual variance for the prediction set spectra. See getResXValSamp(). When this is higher, the udeviation is higher.

  • ResXValTot: The average residual variance for the reference set spectra. See getResXValTot(). When this is higher, the udeviation is lower.

  • ResYValVar: The variance in predictions from their observed values using a cross validation method. When this is higher, the udeviation is higher.

  • Hi: The leverage is the distance of how far samples in the prediction set are from those in the reference set. See getLeverage(). When this is higher, the udeviation is higher.

  • Ical: The number of samples in the calibration/reference set. When this is higher, the u-deviation is lower.

The following functions orchestrate these calculations: