# RFPM

RFPM provides an implementation of the Floating Percentile Model (FPM) in the R statistical environment. The FPM was originally developed by others on behalf of the Washington State Department of Ecology (Avocet 2003) and has since been recommended for use in Washington, Oregon, and Idaho (Ecology 2011). The FPM currently exists as a Microsoft Excel macro (written with Visual Basic for Applications). RFPM represents an independent effort to port the FPM to R with the purpose of:
- correcting a data handling error
- streamlining the chemical selection and sediment quality benchmark calculation procedures
- providing new tools to rapidly evaluate and refine benchmarks
- improving several aspects of the FPM algorithm
- improving the availability, accessibility, and transparency of the FPM tool

The purpose of this vignette is to lead the reader through an example analysis of real-world data using the core functions of RFPM rather than go into great detail on any particular function. For more function details, see the help files by using ? or help() (or by navigating in RStudio to the Packages tab, selecting RFPM from the list of available packages, and then selecting the function of interest).

## Functionality

The main user interface within RFPM is via the FPM function. To use this function, the user must first prepare a sediment chemistry and toxicity data.frame that contains one or more chemical columns and a single toxicity column called Hit. Several example datasets are provided with RFPM with this structure, for example h.northport, which we will evaluate repeatedly in this vignette. Other datasets include the empirical c.northport and h.tristate as well as simulated FPM data perfect, lowNoise, and highNoise.

Other key functions that the user is directed to include optimFPM, cvFPM, and chemVI which can help the user to explore and refine FPM inputs and optimize FPM outputs.

## Case study - Amphipod toxicity in Northport, WA sediments

In this vignette, we focus on evaluating a case study dataset called h.northport, which includes a data.frame of bulk sediment chemistry metals concentrations (mg/kg) and Hit data. Hits are logical results indicating whether a particular sample was deemed toxic (TRUE) or not (FALSE) based on the Hyalella azteca biomass endpoint in 28-day laboratory exposures. The definition of a toxic Hit is up to the practitioner and not a part of the model. Data must be input to FPM in a cross-tab (wide) format, meaning each sediment chemical has a unique column and each sample has just one row.

head(h.northport)
#>        Al    As     Cu    Cd    Cr     Fe    Pb    Hg    Ni    Zn Organism
#> 442  8640  8.38  396.0 3.740  34.6  56900 226.0 0.159 13.20  4300       HA
#> 443  5890  7.69  167.0 4.860  18.7  28500 219.0 0.247 14.00  1780       HA
#> 444 10200 11.10  913.0 0.727  50.8  83300 151.0 0.051  9.91  6490       HA
#> 445 15800 14.60 1630.0 0.688 100.0 161000 211.0 0.016 13.10 13600       HA
#> 446  5950  2.55   30.4 0.721  11.5  14600  63.8 0.020 11.20   484       HA
#> 447  8370  8.74  644.0 0.774  39.3  62400  99.2 0.027  9.68  4740       HA
#>     Meas_Day Endpoint   Hit
#> 442       28  Biomass FALSE
#> 443       28  Biomass FALSE
#> 444       28  Biomass  TRUE
#> 445       28  Biomass  TRUE
#> 446       28  Biomass FALSE
#> 447       28  Biomass FALSE

To run FPM, you will need to specify, at a minimum, the data.frame object and the column names associated with sediment chemistry data. As noted above, FPM also requires that the data.frame include a column called Hit. Columns other than Hit and those specified using the paramList argument are ignored.

p.northport <- names(h.northport)[1:10] ## all chemical column names
FPM(data = h.northport, paramList = p.northport) ## minimum input - dataset and chemical column names
#> $FPM #> Cu Cr Fe Zn FN_crit TP FN TN FP pFN pFP HR NR PHR PNR #> 1 2740 26.7 27400 738 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745 #> FHR FNR OR #> 1 0.51 0.255 0.571 #> #>$chemDensity
#>         Cu    Cr Fe    Zn
#> 1 1.36e-05 0.988  1 0.991

The output of FPM displays three types of data:
- The sediment quality benchmark for each of the chemicals selected by FPM as significant (meaning that concentrations when Hit == TRUE are significantly greater than when Hit == FALSE).
- Selected FN_crit value, the user-defined limit on benchmark conservatism (default = 0.2)
- Classification metrics: e.g., false negatives rate (pFN), false positive rate (pFP), and overall reliability (OR)

This is technically all that was required to run the FPM on this example dataset. There is of course a lot going on behind the scenes. Functions that were used within FPM include:
- chemSig: chemical selection algorithm, which evaluates distribution assumptions of normality and equal variance, then applies appropriate hypothesis test methods to identify significant differences between concentrations when Hit == TRUE and Hit == FALSE. The output of chemSig is a logical vector indicating which chemicals are significant (therefore selected by FPM for generating benchmarks).
- chemSigSelect: a function that uses chemSig but instead returns the chemistry data for significant chemicals and has options for plotting chemistry data distributions to compare Hit == TRUE and Hit == FALSE subsets. Setting plot = TRUE in FPM turns on plot outputs (by passing that argument to chemSigSelect).

As an example, see the following figures generated by plot.chemSigSelect (identical to those generated by FPM when plot = TRUE) for a significant chemical (Cr), shown with blue points, and a non-significant chemical (Al), shown with green points:

plot(chemSigSelect(h.northport[, c("Al", "Cr", "Hit")], paramList = c("Al", "Cr")), type = "boxplot")

There are also many arguments that can be adjusted in FPM that affect how chemSig and chemSigSelect function or how the FPM algorithm itself functions. Users can adjust the test methods/assumptions for selecting chemicals, force the FPM to accept the assumptions of the original Excel-based function (i.e., ExcelMode == TRUE), etc.

See ?FPM and ?chemSig for more information.

## Optimizing FPM Inputs

In RFPM, there are currently two optimization functions available:
- optimFPM is intended to find an optimal input (to FPM) of the FN_crit and/or alpha parameter. These optimal levels are based on the full dataset (input as data argument) and, therefore, may be over-specific to that data. The FN_crit and alpha arguments should be input either as ranges or single values depending on how you want the optimization to run. Inputting a single value for FN_crit and a range for alpha would optimize the alpha value only and vice-versa. Inputting ranges for both values results in slightly more complex output owing to the 2-dimensional optimization.
- cvFPM is currently parameterized to optimize FN_crit inputs using a cross-validation (CV) approach. The result is a “best” value that tries to account for uncertainty in future data. The user can adjust the k parameter to change the number of folds in the CV algorithm. This affects the smoothness of the mean/median optimization curves by generating more hypothetical datapoints; it also is likely to expand the min/max range of optimization values by providing more chances for extreme outcomes. By using larger training datasets (i.e., with larger k inputs), the CV-based optimum will converge toward the empirical optimization curve generated by optimFPM (and shown in the output of cvFPM). The value of k is constrained (e.g., the leave-one-out CV method is impossible here) by the requirement to have at least 3 Hit and 3 No-hit data in the test and training datasets, else FPM and chemSig cannot run within the cvFPM algorithm. Thus, we recommend not increasing k by a large amount unless data is also very large.

This is an example of the empirical optimization approach with optimFPM:

## one-way optimization of the FN Limit - vertical lines show best values based on two metrics
optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha = 0.05)

#> balance_FN.FP        max_OR
#>           0.4           0.6

## two-way optimization of both FN Limit and alpha - black squares show best values based on two metrics
optimFPM(h.northport, p.northport,
FN_crit = seq(0.1, 0.9, 0.05),
alpha = seq(0.01, 0.2, 0.01))

#>         balance_FN.FP max_OR
#> FN_crit          0.40   0.25
#> alpha            0.01   0.11

The resulting plots show a mix of potential outcomes. When only one parameter is being optimized (i.e., length(alpha) == 1 | length(FN_crit) == 1), then the first plot is produced, showing the rates of FNs, FPs, and OR. Vertical lines identify optimal values: either the highest OR value (lowest alpha or FN_crit value when there are ties) or the point at which the FN and FP rates are most similar (i.e., balanced under- and overprediction of toxicity).

In the case that two parameters are being optimized together (i.e., length(alpha) > 1 & length(FN_crit) > 1), then a matrix of results is provided representing with heat colors to indicate the optimization outcome (yellow = suboptimal, red = optimal). Squares indicate the selected values, which are also output into the console. A black square indicates which value applies to the stated optimization statistic; the gray square is the value selected according to the alternative statistic (i.e., the second figure).

This is an example of the CV-based optimization approach with cvFPM and 10-folds:

cvFPM(h.northport, p.northport, k = 10, FN_crit = seq(0.1, 0.9, 0.05), alpha = 0.05)

#> $optim_FN #> balance_FN.FP max_OR #> 0.50 0.65 Two figures are generated showing the optimization statistics from multiple CV runs (as well as the empirical result from optimFPM - the “actual” value) and how stable the classification metrics may be at various model inputs. The first figure shows the optimization of the overall reliability; the second reduces the difference between FP and FN rates, similar to optimFPM. Optimized outputs attempt to account for the mean, median, maximum, and minimum results (among CV runs) altogether. Based on the output from either optimFPM or cvFPM, the practitioner may decide to adjust and rerun the FPM using optimized inputs. The decision of whether this is appropriate will depend on the specific management scenario. For example, the optimal FN Limit could end up being 80%, but that lack of conservatism may be unacceptable to a regulator. Similarly, the optimal alpha could be 0.5, meaning that the regulator would need to accept a 50% probability of error when selecting significant chemicals; this is a very high uncertainty, 10-fold higher than typical. Regardless, optimization provides useful context to the RFPM user. Using both optimFPM and cvFPM is recommended, because they can provide somewhat different or complementary results that facilitate decision making about subsequent FPM runs. ## Variable Importance The chemVI function outputs several potentially useful metrics for evaluating individual metals in the sediment quality benchmark set. These metrics are: - chemDensity: how little the benchmark “floated” within the FPM algorithm. High density values correspond to benchmarks that remained at a relatively low concentrations (i.e., did not float up), suggesting relative importance - MADP: average change in other chemical’s benchmarks resulting from removal of a chemical from data - dOR: change in the overall reliability resulting from removal of the chemical from data The MADP and dOR metrics are the more useful for identifying important chemicals; for example:  chemVI(h.northport, p.northport) #> chemDensity MADP dOR #> Cu 0.001 0.00 0.0 #> Cr 98.800 996.00 2.0 #> Fe 100.000 8.52 0.0 #> Zn 99.100 32.50 0.6 Here we see that copper (Cu) has an MADP and dOR of 0, meaning that there was no change in the benchmarks or their ability to predict toxicity after removing copper. Clearly copper is, therefore, not important in this case and could be excluded from further consideration. However, that isn’t to say that copper isn’t related to toxicity; it’s very important to keep in mind that the FPM is a classification tool that attempts to accurately predict Hit values. The significant chemicals in this list are mostly well correlated, so toxicity could be related to one, all, or perhaps none of the chemicals (i.e., an unmeasured but related chemical or non-chemical stressor). The inclusion of new chemicals in (or exclusion of important chemicals from) data can result in markedly different benchmarks and chemVI metrics (particularly for the chemDensity metric). Therefore, we recommend pre-screening data to remove chemicals that are unlikely to cause toxicity (e.g., due to poor bioavailability, low toxicity, or uniformly low concentrations). These chemicals, while potentially “significant” with respect to having higher concentrations when Hit == TRUE, may lead to problems down the road when predicting toxicity more generally. In other words, try to the extent possible to establish a weight of evidence pointing toward causation rather than letting pure correlation lead your analysis. In the current example, we perhaps should have pre-screened chemicals like Fe and Al, which we expected to have low toxicity at natural levels. Al was ultimately screened out by FPM as non-significant, but Fe was significant and retained. chemVI can be helpful with respect to developing a parsimonious benchark set by identifying benchmarks that do not affect prediction at the site. In our h.northport example, Cu and Fe will be removed because they don’t affect toxicity predictions (either positively or negatively). Whenever any chemicals would be removed from consideration through an iterative FPM benchmark development process, FPM must be rerun to generate new benchmarks. Compare the two outputs: FPM(data = h.northport, paramList = p.northport) #>$FPM
#>     Cu   Cr    Fe  Zn FN_crit TP FN TN FP   pFN   pFP    HR    NR  PHR   PNR
#> 1 2740 26.7 27400 738     0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745
#>    FHR   FNR    OR
#> 1 0.51 0.255 0.571
#>
#> $chemDensity #> Cu Cr Fe Zn #> 1 1.36e-05 0.988 1 0.991 FPM(data = h.northport, paramList = c("Cr", "Zn")) #>$FPM
#>     Cr  Zn FN_crit TP FN TN FP   pFN   pFP    HR    NR  PHR   PNR  FHR   FNR
#> 1 26.1 910     0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745 0.51 0.255
#>      OR
#> 1 0.571
#>
#> $chemDensity #> Cr Zn #> 1 1 0.983 For the FPM benchmarks generated for h.northport, the MADP for Fe was >0, and the benchmark for Zn shifted after removing Fe from consideration. Although the benchmark changed, the classification errors did not, as reflected by the unchanged OR. The final FPM benchmark set including only Cr and Zn is, therefore, the most parsimonious, by which we mean that it classifies Hit values just as accurately as the larger set of benchmarks but does so with fewer benchmarks. You might have noticed that we didn’t use the optimized inputs in the example above for chemVI. We omitted that to provide an example where relatively unimportant chemicals could be screened out using the metrics generated by chemVI. If we instead change the inputs of chemVI to the optimized FN_crit and alpha (using the max_OR optimization approach), the chemVI results look like this instead: FPM(h.northport, p.northport, FN_crit = 0.25, alpha = 0.11)$FPM
#>      Al  Cu  Cr     Fe  Pb   Zn FN_crit TP FN TN FP   pFN   pFP    HR    NR
#> 1 24400 131 131 247250 904 1320    0.25 46 15 53 33 0.246 0.384 0.754 0.616
#>     PHR   PNR   FHR   FNR    OR
#> 1 0.582 0.779 0.418 0.221 0.673
chemVI(h.northport, p.northport, FN_crit = 0.25, alpha = 0.11)
#> Al      60.200 435.0 12.2
#> Cu      99.500  13.4  3.4
#> Cr       0.512   0.0  0.0
#> Fe       1.060   0.0  0.0
#> Pb       0.090   0.0  0.0
#> Zn     100.000  22.1 13.6

In this case, adjusting alpha has allowed for 2 new chemicals to be added, Al and Pb, which were not selected when alpha = 0.05. After including these chemicals (and slightly adjusting the FN_crit), the final OR increased to 67%, a 10% improvement in benchmark accuracy over the default values (not shown). Based on the results of chemVI shown above, there is a marked shift in the relative importance of metals for toxicity predictions. Removing any of the chemicals would result in a roughly 10% reduction in accuracy (based on the dOR metric), and removing Cu now would have an effect on the other benchmarks values (as shown by the MADP metric >0). The chemDensity value for Cr has dropped to <1%, meaning that the benchmark is now close to the maximum possible value, whereas it had one of the highest chemDensity values before.

As you can see, variable importance is highly dependent on the set of benchmarks, not just individual benchmarks in isolation. The addition of new chemicals had a big impact not only on the relative importance of each chemical but the overall reliability of the set of chemicals and on the FPM benchmark values themselves.

## Notes on Benchmark Applications and Interpretation

We want to emphasize several findings from our analyses of the characteristics, behavior, and sensitivity of the FPM (the subject of two anticipated manuscripts).

1. FPM benchmarks should be treated as a set of values, not values to be used singularly or mixed and matched between runs of the FPM using different species, toxicity test endpoints, sites, etc. There may be a desire or an inclination to select the most sensitive species and test endpoints among chemicals. Doing so, however, would invalidate the various assumptions made when generating FPM benchmarks, including the projected FN Limit (i.e., pFN) and the accuracy of benchmarks (i.e., OR). If there are multiple species that need to be accounted for by the model, consider different approaches to defining Hit values that capture multiple species and/or endpoints rather prior to running FPM. The final set of FPM benchmarks should then be used as a group to predict Hits. If highly protective benchmarks are desired, then we recommend running the FPM for all species and endpoints of interest, selecting the endpoint with the greatest degree of effect, and then either running the FPM using that endpoint to define Hits, or, alternatively, running FPM on all endpoints and selecting theh single lowest set of benchmarks among endpoints. Lastly, the FN_crit input (i.e., FN Limit) can be decreased to increase benchmark conservatism (at the possible expense of accuracy).

2. FPM benchmarks should be considered as predictive of toxicity classifications (i.e., Hit values) rather than continuous toxicity test results (e.g., 10% mortality, 27% biomass, etc.). For this reason, we would generally argue against the application of FPM benchmarks within a Natural Resource Damage Assessment (NRDA) context. We recognize that future approaches might be proposed for applying FPM benchmarks in a NRDA context; while theoretically possible, such attempts should be considered carefully, particularly with regard to how Hits are defined (and whether the definition reasonably corresponds with definitions of injury).

3. FPM benchmarks are most powerful when they are site-specific. By using site toxicity and sediment chemistry data, and by implicitly considering factors like bioavailability, FPM benchmarks should improve upon existing default benchmarks (Ecology 2011). Therefore, we recommend using RFPM and site-specific data to the extent practicable to derive new benchmarks.

4. While it is most common to generate benchmarks based on bulk sediment concentrations, it is not necessary that this be the case. For example, benchmarks could reflect organic carbon-normalized sediment concentrations, pore water concentrations, etc. These measurements might improve the relevance of toxicity predictions by taking bioavailability more explicitly into account. Attempts to generate benchmarks based on these types of data simply complicate the application of the benchmarks by requiring all future samples be analyzed and concentrations reported in the same way.

5. While there may be a concern about mixing chemicals of different types and mechanisms/modes of action (e.g., metals and polycyclic aromatic hydrocarbons) when generating FPM benchmarks, it does not seem to us that such a distinction should matter. From a conceptual standpoint, the FPM could be used to predict any variable that is distilled to a TRUE/FALSE classification from a set of one or more independent variables (chemical or otherwise). The key limitations that arise when mixing chemicals (and/or non-chemical stressors) are:

• All analytes added to the FPM via data need to have a common directionality with respect to Hits. In other words, toxicity is expected to increase as, for example, copper increases. Therefore, you could not also include pore water acidity, which increases toxicity at both high and low pH. This limitation is related to the chemical selection process, which (by default) assumes that concentrations will be higher among toxic samples. It also relates to the iteratively increasing nature of the FPM algorithm; concentrations start at a low level and then shift one direction (up).
• The inclusion of many correlated variables may increase the number of irrelevant benchmarks. For example, in the analysis of h.northport dataset presented above (using default FPM inputs), Fe was correlated with other metals (not shown) and, as a result, was selected for inclusion among the benchmarks despite being relatively non-toxic and contributing not at all to the predictive accuracy of the benchmark set. As described above, chemVI can help to identify potential simplifications to the benchmark set when correlated variables lead to unnecessary complexity. The user is also free to pre-screen their data to remove redundant variables.
1. FPM has been found to work poorly or not at all with missing data. Consider removing analytes with incomplete datasets, omit samples with partial analysis, and/or impute values to fill data gaps.

2. FPM does not currently address non-detection in a meaningful way. As the model is largely based on percentiles and ranges rather than averages, this shouldn’t generally lead to major problems as long as there isn’t a high degree of non-detection (e.g., >20%). We leave it to the user to address how non-detects will be handled and to use FPM benchmarks based on non-detected data with caution. High degrees of non-detection of a chemical is a good reason for excluding it from data (or paramList), as there is not a reasonable way to use such a chemical to predict Hits. The use of more advanced censored data imputation methods (e.g., Kaplan-Meier or Regression on Order Statistics [ROS]) may provide useful, but should also be used with care; using these methods (in addition to treating non-detects as half of detection limits or as zero) have the potential to result in benchmarks that fall below detection limits. We would argue that it is unreasonable to apply benchmarks that fall at or below detection limits.

3. As with other toxicity-based benchmarks, it may be important to consider background chemical concentrations and reference area toxicity levels when developing and applying FPM benchmarks. Reference area toxicity can be incorporated explicitly into Hits definitions by normalizing toxicity data to the reference condition prior to classifying Hits, and background can be used to qualify chemical exceedances of FPM benchmarks that fall below background levels.

## References cited:

Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA.

Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.