3 Getting Started

The best way to get started using this code, is by downloading the Soil-Predictions-Example folder found here:
Soil-Predictions-Example Folder

This folder, along with all source code for this guide, can be found in the following Github Repository:
whrc/Soil-Predictions-MIR

File Walkthrough

  1. Within the Soil-Predictions-Example folder, you will find the following folders and files:

  2. Double click Soil-Predictions-Example.Rproj to open up the R-project. Within a project, the working directory is set to the project’s folder.

  3. Open up RUNFILE.R in the project environment. This is an example script of how to make soil predictions using spectral data. It includes the use of both PLSR models and MBL models, which are both explained in this guide.

  4. Navigate to the Functions folder. Within this folder are R files containing functions useful for MIR soil predictions. These files will be sourced by each other, and RUNFILE.R

  1. Navigate to the Data_Raw folder. This should contain:
    • LAB_DATA.csv: A ‘.csv’ file of the lab data; At a minimum, it should have a sample_id column and the lab data for a given property
    • SPECTRA: A folder of OPUS files containing the spectral data for each sample

Required Packages

Open up RUNFILE.R and install the packages listed at the top:

simplerspec

simplerspec documentation simplerspec is a package that streamlines the process of making models from FT-IR data. However, we were unable to install it and ended up just manually saving the functions read-opus-universal and gather-spc to the folder Functions/simplerspec

stringr

stringr documentation stringr is used in the processing spectra portion of the code for subsetting strings with the str_sub function

foreach

foreach documentation foreach is in the processing spectra portion of the code within the simplrspec function read-opus-universal

prospectr

prospectr documentation prospectr is used in the processing portion of the code to split it into calibration and validation sets using the function kenStone()

clhs

clhs documentation clhs or Conditional Latin Hypercube Sampling is used in the processing portion of the code to subset large datasets that exceed 15000 samples

matrixStats

matrixStats documentation matrixStats is used in the base_offset function in the preprocessing portion of the code.

plot3D

plot3D documentation plot3D is used within the fratio_outliers function to show spectral outliers in 3D principal component space.

pls

pls documentation pls is used to create partial least squares regression models.

resemble

resemble documentation resemble is used to create memory based learner models.

Demo Script

Run the RUNFILE.R script. This will create…

  1. Data_Processed: A folder containing the processed data, used to build the model and make predictions
  2. Models: A folder containing the plsr and mbl models made
  3. Predictions: A folder containing the predictions output by the script and a prediction performance log

To modify for your own spectral library….

  1. Change the spectral files in Data_Raw/SPECTRA
  2. Change the lab data in Data_Raw/LAB_DATA.csv
  3. Update the name of the property in RUNFILE.R, to match the column name of the property you would like to predict. Currently, we are predicted for “OC”.

Below is the full RUNFILE.R script, organized with 3 main sections & their corresponding functions, linked below and described in sections 4-7 of this guide.

  1. Data Preprocessing
  2. PLSR Models
  3. MBL Models