powdR: Full Pattern Summation of X-ray Powder Diffraction Data

Benjamin Butler

2020-10-15

1 Overview

powdR is an R implementation of the full pattern summation approach to quantitative mineralogy from X-ray powder diffraction (XRPD) data. Whilst this is available in Excel spreadsheets such as FULLPAT (Chipera and Bish 2002) and RockJock (Eberl 2003), implementation in R allows for faster computation than is currently available, and provides a user-friendly Shiny application to help with the often iterative process of mineral quantification. Furthermore, the afps() function in powdR is designed to automate the full pattern summation procedure, which is particularly advantageous in high-throughput XRPD datasets.

2 Full pattern summation

A powerful property of XRPD data is that it can provide quantitative estimates of phase concentrations in mineral mixtures. Though several methods are available for such quantitative analysis, full pattern summation (also referred to as “full pattern fitting of prior measured standards”) is particularly suitable for mixtures containing crystalline mineral components in combination with disordered and/or X-ray amorphous phases. Soil is a prime example of such mixtures, where crystalline minerals such as quartz and feldspars can be present in combination with clay minerals (i.e. disordered phases), and organic matter (i.e. amorphous phases).

The full pattern summation implemented in powdR is based on the principle that an observed pattern is comprised of the sum of signals from the individual components within it. A key component of this approach is a library containing measured or calculated patterns of the pure phases that may be encountered within the samples. These “reference” patterns would ideally be measured on the same instrument as the sample. To quantify a given sample, suitable phases from the library are selected that together account for all peaks within the data, and their relative contributions to the observed signal optimised until an appropriate fit is achieved (Chipera and Bish 2002). This fit is usually refined using least squares optimisation of an objective parameter. The scaled intensities of the optimised patterns are then converted to weight % using reference intensity ratios, which are a measure of diffracting power relative to a standard mineral (usually corundum).


3 Using powdR

3.1 The powdRlib object

A key component of the functions within powdR is the library of reference patterns. These are stored within a powdRlib object created using the powdRlib() constructor function. powdRlib() builds a powdRlib object from two components. The first component, specified via the xrd_table argument of powdRlib(), is a data frame of the count intensities of the reference patterns, with their 2\(\theta\) axis as the first column. The column for a given reference pattern must be named using a unique identifier (a phase ID). An example of such a format is provided in the minerals_xrd data:

library(powdR)

data(minerals_xrd)

head(minerals_xrd[1:9])
#>       tth QUA.1 QUA.2 FEL ORT SAN ALB OLI DOL.1
#> 1 4.00973    69    91 546 599 638 308 343   268
#> 2 4.04865    69    92 524 570 609 294 332   256
#> 3 4.08757    64    86 505 555 582 286 328   250
#> 4 4.12649    64    83 512 543 558 277 310   247
#> 5 4.16541    62    83 478 518 536 275 304   241
#> 6 4.20433    60    81 459 514 517 261 298   228

The second component required to build a powdRlib object (specified via the phases_table argument of powdRlib()) is a data frame containing 3 columns. The first is a string of unique ID’s representing each reference pattern in the data provided to the xrd_table argument. The second column is the name of the phase group that this reference pattern belongs to (e.g. quartz or dolomite). The third column is the reference intensity ratio (RIR) of that reference pattern (relative to a known standard, usually corundum). An example of the format required for the phases_table argument of powRlib() is provided in the minerals_phases data.

data(minerals_phases)

minerals_phases[1:8,]
#>    phase_id  phase_name  rir
#> 12    QUA.1      Quartz 4.62
#> 13    QUA.2      Quartz 4.34
#> 4       FEL  K-feldspar 0.75
#> 11      ORT  K-feldspar 1.03
#> 14      SAN  K-feldspar 0.93
#> 1       ALB Plagioclase 1.31
#> 9       OLI Plagioclase 1.06
#> 2     DOL.1    Dolomite 2.35

Crucially when building the powdRlib object, all phase ID’s in the first column of the phases_table must match the column names of the xrd_table (except the name of the first column which is the 2\(\theta\) scale), for example.

identical(names(minerals_xrd[-1]),
          minerals_phases$phase_id)
#> [1] TRUE

Once created, powdRlib objects can easily be visualised using plot(), which when used for powdRlib objects accepts the arguments wavelength and refs that are used to specify the X-ray wavelength and the reference patterns to plot, respectively. In all cases where plot() is used hereafter, the addition of interactive = TRUE to the function call will produce an interactive html graph that can be viewed in RStudio or web browsers..

my_lib <- powdRlib(minerals_xrd, minerals_phases)

plot(my_lib, wavelength = "Cu",
     refs = c("ALB", "DOL.1",
              "QUA.1", "GOE.2"))

3.1.1 Subsetting a powdRlib object

Occasionally it may be useful to subset a reference library to a smaller selection. This can be achieved using subset(), which for powdRlib objects accepts three arguments; x, refs and mode. The x argument specifies the powdRlib object to be subset, refs specifies the ID’s and/or names of phases to select, and mode specifies whether these phases are kept (mode = "keep") or removed (mode = "remove").

data(rockjock)

#Have a look at the phase ID's in rockjock
rockjock$phases$phase_id[1:10]
#>  [1] "CORUNDUM"                "BACK_POS"               
#>  [3] "BACK_NEG"                "QUARTZ"                 
#>  [5] "ORDERED_MICROCLINE"      "INTERMEDIATE_MICROCLINE"
#>  [7] "SANIDINE"                "ORTHOCLASE"             
#>  [9] "ANORTHOCLASE"            "ALBITE_CLEAVELANDITE"

#Remove reference patterns from rockjock
rockjock_1 <- subset(rockjock,
                     refs = c("ALUNITE", #phase ID
                              "AMPHIBOLE", #phase ID
                              "ANALCIME", #phase ID
                              "Plagioclase"), #phase name
                     mode = "remove")

#Check number of reference patterns remaining in library
nrow(rockjock_1$phases)
#> [1] 157

#Keep certain reference patterns of rockjock
rockjock_2 <- subset(rockjock,
                     refs = c("ALUNITE", #phase ID
                              "AMPHIBOLE", #phase ID
                              "ANALCIME", #phase ID
                              "Plagioclase"), #phase name
                     mode = "keep")

#Check number of reference patterns remaining
nrow(rockjock_2$phases)
#> [1] 11

3.2 RockJock

There are two powdRlib objects provided as part of the powdR package. The first is minerals (accessed via data(minerals)), which is a simple and low resolution library designed to facilitate fast computation of basic examples. The second is rockjock (accessed via data(rockjock)), which is a comprehensive library of 169 reference patterns covering most phases that might be encountered in geological and soil samples. The rockjock library in powdR uses data from the original RockJock program (Eberl 2003) thanks to the permission of Dennis Eberl. In rockjock, each reference pattern from the original RockJock program has been scaled to a maximum intensity of 10000 counts, and the RIRs normalised relative to Corundum. All rockjock data were analysed using Cu K\(\alpha\) radiation.

3.3 RockJock synthetic mixtures

To accompany the rockjock reference library, a list of eight synthetic mixtures from the original RockJock program (Eberl 2003) are also included in powdR in the rockjock_mixtures data (accessed via data(rockjock_mixtures). Their known weights (see ?rockjock_mixtures) can be compared to full pattern summation outputs (i.e. from fps() and afps(), detailed below) to assess accuracy.

3.4 Full pattern summation:

Full pattern summation in powdR is provided via fps(), and an automated version provided in afps(). Here the rockjock and rockjock_mixtures data will be used to demonstrate the use of these functions.

3.4.1 Full pattern summation with internal standard

In some cases samples are prepared for XRPD with an internal standard of known concentration. If this is the case, then the std_conc argument of fps() and afps() can be used to define the concentration of the internal standard (in weight %), which is used in combination with the reference intensity ratios to compute phase concentrations. For example, all samples in the rockjock_mixtures data were prepared with 20 % corundum as the internal standard, thus this can be specified using using std = "CORUNDUM" and std_conc = 20 in the call to fps() or afps().

data("rockjock_mixtures")

fit_1 <- fps(lib = rockjock,
             smpl = rockjock_mixtures$Mix1,
             refs = c("ORDERED_MICROCLINE",
                      "LABRADORITE",
                      "KAOLINITE_DRY_BRANCH",
                      "MONTMORILLONITE_WYO",
                      "ILLITE_1M_RM30",
                      "CORUNDUM"),
             std = "CORUNDUM",
             std_conc = 20,
             align = 0.3)
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Using internal standard concentration of 20 % to compute phase concentrations
#> ***Full pattern summation complete***

The computed concentrations can be accessed in the phases dataframe of the powdRlib output:

fit_1$phases
#>               phase_id    phase_name       rir phase_percent
#> 1             CORUNDUM      Corundum 1.0000000       20.0000
#> 2   ORDERED_MICROCLINE    K-feldspar 0.9654312        3.6104
#> 3          LABRADORITE   Plagioclase 0.8113040       19.3277
#> 4 KAOLINITE_DRY_BRANCH     Kaolinite 0.5812875       11.6870
#> 5       ILLITE_1M_RM30        Illite 0.2768664        6.4234
#> 6  MONTMORILLONITE_WYO Smectite (Di) 0.3202779       40.9453

Further, notice that if the concentration of the internal standard is specified then the phase concentrations do not necessarily sum to 100 %:

sum(fit_1$phases$phase_percent)
#> [1] 101.9938

Unlike other software where only certain phases can be used as an internal standard, any phase can be defined in powdR. For example, the rockjock_mixtures$Mix5 sample contains 20 % quartz (see ?rockjock_mixtures), thus adding "QUARTZ" as the std argument results in this reference pattern becoming the internal standard instead.

fit_2 <- fps(lib = rockjock,
             smpl = rockjock_mixtures$Mix5,
             refs = c("ORDERED_MICROCLINE",
                      "LABRADORITE",
                      "KAOLINITE_DRY_BRANCH",
                      "MONTMORILLONITE_WYO",
                      "CORUNDUM",
                      "QUARTZ"),
             std = "QUARTZ",
             std_conc = 20,
             align = 0.3)
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Using internal standard concentration of 20 % to compute phase concentrations
#> ***Full pattern summation complete***

fit_2$phases
#>               phase_id    phase_name       rir phase_percent
#> 1             CORUNDUM      Corundum 1.0000000       19.9145
#> 2               QUARTZ        Quartz 3.5404393       20.0000
#> 3   ORDERED_MICROCLINE    K-feldspar 0.9654312       33.0677
#> 4          LABRADORITE   Plagioclase 0.8113040        7.8474
#> 5 KAOLINITE_DRY_BRANCH     Kaolinite 0.5812875        4.3197
#> 6  MONTMORILLONITE_WYO Smectite (Di) 0.3202779       11.2179

sum(fit_2$phases$phase_percent)
#> [1] 96.3672

3.4.2 Full pattern summation without internal standard

In cases where an internal standard is not added to a sample, phase quantification can be achieved by assuming that all detectable phases can be identified and that they sum to 100 weight %. By setting the std_conc argument of fps() or afps() to NA, or leaving it out of the function call, it will be assumed that the sample has been prepared without an internal standard and the phase concentrations computed accordingly.

fit_3 <- fps(lib = rockjock,
             smpl = rockjock_mixtures$Mix1,
             refs = c("ORDERED_MICROCLINE",
                      "LABRADORITE",
                      "KAOLINITE_DRY_BRANCH",
                      "MONTMORILLONITE_WYO",
                      "ILLITE_1M_RM30",
                      "CORUNDUM"),
             std_conc = NA,
             std = "CORUNDUM",
             align = 0.3)
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Full pattern summation complete***

In this case the phase specified in the std argument is only used for sample alignment, and is included in the computed phase concentrations.

fit_3$phases
#>               phase_id    phase_name       rir phase_percent
#> 1             CORUNDUM      Corundum 1.0000000       19.6090
#> 2   ORDERED_MICROCLINE    K-feldspar 0.9654312        3.5399
#> 3          LABRADORITE   Plagioclase 0.8113040       18.9498
#> 4 KAOLINITE_DRY_BRANCH     Kaolinite 0.5812875       11.4586
#> 5       ILLITE_1M_RM30        Illite 0.2768664        6.2978
#> 6  MONTMORILLONITE_WYO Smectite (Di) 0.3202779       40.1449

Furthermore, the phase concentrations sum to 100 %.

sum(fit_3$phases$phase_percent)
#> [1] 100

3.4.3 Full pattern summation with data harmonisation

It is usually recommended that the reference library used for full pattern summation is measured on the same instrument as the sample using an identical 2\(\theta\) range and resolution. In some cases this is not feasible, and the reference library patterns may be from a different instrument to the sample. To allow for seamless use of samples and libraries from different instruments (measured on the same wavelength), fps() and afps() contain a logical harmonise argument (default = TRUE). When the sample and library contain non-identical 2\(\theta\) axes, harmonise = TRUE will convert the data onto the same scale by determining the overlapping 2\(\theta\) range and interpolating to the coarsest resolution available.

#Create a sample with a shorter 2theta axis than the library
Mix1_short <- subset(rockjock_mixtures$Mix1, tth > 10 & tth < 55)

#Reduce the resolution by selecting only odd rows of the data
Mix1_short <- Mix1_short[seq(1, nrow(Mix1_short), 2),]

fit_4 <- fps(lib = rockjock,
             smpl = Mix1_short,
             refs = c("ORDERED_MICROCLINE",
                      "LABRADORITE",
                      "KAOLINITE_DRY_BRANCH",
                      "MONTMORILLONITE_WYO",
                      "ILLITE_1M_RM30",
                      "CORUNDUM"),
             std = "CORUNDUM",
             align = 0.3)
#> 
#> -Harmonising library to the same 2theta resolution as the sample
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Full pattern summation complete***

fit_4$phases
#>               phase_id    phase_name       rir phase_percent
#> 1             CORUNDUM      Corundum 1.0000000       19.5079
#> 2   ORDERED_MICROCLINE    K-feldspar 0.9654312        3.8663
#> 3          LABRADORITE   Plagioclase 0.8113040       19.8705
#> 4 KAOLINITE_DRY_BRANCH     Kaolinite 0.5812875       12.8507
#> 5       ILLITE_1M_RM30        Illite 0.2768664        8.4559
#> 6  MONTMORILLONITE_WYO Smectite (Di) 0.3202779       35.4486

3.5 Automated full pattern summation

The selection of suitable reference patterns for full pattern summation can often be challenging and time consuming. An attempt to automate this process is provided in afps(), which can select appropriate reference patterns from a reference library and subsequently exclude reference patterns based on limit of detection estimates. Such an approach is considered particularly advantageous when quantifying high-throughput XRPD datasets that display considerable mineralogical variation.

Here the rockjock library, containing 169 reference patterns, will be used to quantify one of the samples in the rockjock_mixtures data.

fit_5 <- afps(lib = rockjock,
              smpl = rockjock_mixtures$Mix1,
              std = "CORUNDUM",
              align = 0.3,
              lod = 1)
#> 
#> -Using all reference patterns in the library
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Applying non-negative least squares
#> -Optimising...
#> -Removing negative coefficients and reoptimising...
#> -Calculating detection limits
#> -Removing phases below detection limit
#> -Reoptimising after removing crystalline phases below the limit of detection
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Automated full pattern summation complete***

fit_5$phases_grouped
#>      phase_name phase_percent
#> 1      Corundum       19.2449
#> 2    Background        0.0001
#> 3    K-feldspar        2.6398
#> 4   Plagioclase       19.2431
#> 5     Kaolinite       10.8828
#> 6 Smectite (Di)       40.0788
#> 7        Illite        7.9105

3.6 Plotting

Plotting results from fps() or afps() (powdRfps and powdRafps objects, respectively) is achieved using plot(). Static ggplot() plots can be created using:

plot(fit_5, wavelength = "Cu")

Alternatively, interactive ggplotly() plots can be created by adding interactive = TRUE to the function call, e.g. plot(fit_5, wavelength = "Cu", interactive = TRUE).

In addition to above, plotting for powdRfps and powdRafps objects can be further adjusted by the mode and xlim arguments. The mode argument can be one of “fit” (the default), “residuals” or “both”, for example:

plot(fit_5, wavelength = "Cu", mode = "residuals")

or alternatively:

plot(fit_5, wavelength = "Cu", mode = "both", xlim = c(20,30))

3.7 Quantifying multiple samples

The easiest way to quantify multiple samples is with lapply()

multi_fit <- lapply(rockjock_mixtures[1:3], fps,
                    lib = rockjock,
                    std = "CORUNDUM",
                    refs = c("ORDERED_MICROCLINE",
                             "LABRADORITE",
                             "KAOLINITE_DRY_BRANCH",
                             "MONTMORILLONITE_WYO",
                             "ILLITE_1M_RM30",
                             "CORUNDUM",
                             "QUARTZ"),
                    align = 0.3)
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Full pattern summation complete***
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Full pattern summation complete***
#> 
#> -Aligning sample to the internal standard
#> -Interpolating library to same 2theta scale as aligned sample
#> -Optimising...
#> -Removing negative coefficients and reoptimising...
#> -Computing phase concentrations
#> -Internal standard concentration unknown. Assuming phases sum to 100 %
#> ***Full pattern summation complete***

names(multi_fit)
#> [1] "Mix1" "Mix2" "Mix3"

3.8 Summarising mineralogy

When multiple samples are quantified, it is often useful to report the phase concentrations of all of the samples in a single table. For a given list of powdRfps and/or powdRafps objects, the summarise_mineralogy() function yields such summary tables, for example:

summarise_mineralogy(multi_fit, type = "grouped", order = TRUE)
#>   sample_id Kaolinite Corundum Plagioclase Smectite (Di)  Illite K-feldspar
#> 1      Mix1   11.4255  19.6390     18.9222       40.2223  6.0943     3.4771
#> 2      Mix2   19.9893  20.1257     34.2955        3.1401  9.8914     7.9668
#> 3      Mix3   38.0548  20.1476          NA        3.5897 19.4220    11.0541
#>   Quartz
#> 1 0.2197
#> 2 4.5911
#> 3 7.7318

where type = "grouped" denotes that phases with the same phase_name will be summed together, and order = TRUE specifies that the columns will be ordered from most common to least common (assessed by the sum of each phase across the samples).

3.9 The powdR Shiny app

To run powdR via the Shiny app, use run_powdR(). This loads the application in your default web browser. The application has six tabs:

  1. Reference Library Builder: Allows you to create and export a powdRlib reference library from two .csv files: one for the XRPD measurements, and the other for the ID, name and reference intensity ratio of each pattern.
  2. Reference Library Viewer: Facilitates quick inspection of the phases within a powdRlib reference library.
  3. Reference Library Editor: Allows the user to easily subset a powdRlib reference library .
  4. Full Pattern Summation: A user friendly interface for iterative full pattern summation of single samples using fps() or afps().
  5. Results Viewer/Editor: Allows for results from previously saved powdRfps and powdRafps objects to be viewed and edited via addition or removal of reference patterns.
  6. Help Provides a series of video tutorials (via YouTube) detailing the use of the powdR Shiny application.

Note: that the Shiny application may not work properly in Microsoft Edge web browsers. Please get in touch if you experience any other issues with the Shiny application.

References

Chipera, Steve J., and David L. Bish. 2002. “FULLPAT: A full-pattern quantitative analysis program for X-ray powder diffraction using measured and calculated patterns.” Journal of Applied Crystallography 35 (6): 744–49. https://doi.org/10.1107/S0021889802017405.

Eberl, D. D. 2003. “User’s guide to ROCKJOCK - A program for determining quantitative mineralogy from powder X-ray diffraction data.” Boulder, CA: USGS.