(Version français ci-dessous)
Many chemists use Fourier Transform Infra-red spectroscopy (FTIR) to analyze chemicals and materials in their laboratories. They may test for purity, composition of mixtures, changes in functional groups, or evaluate other properties. These spectra are sometimes included in journal articles but, as of yet, R has not had a convenient tool for plotting spectra.
In this document we’ll be using the built-in data in the
PlotFTIR
package to demonstrate loading data, modifying it,
and plotting the resulting spectra. While the package doesn’t have raw
data files, we will simulate this portion by reading the built-in data
from disk.
To read FTIR files into R, the PlotFTIR
package must be
installed and loaded. Installation of the stable version of the package
can be from CRAN.
If you want to install the development version of this package, this
can be done using the devtools
package:
Once installed by any method, the package needs to be activated in R before use.
This document presumes you have FTIR spectra files in .csv, .txt
(comma delimited), or .asp format. Additional file types may be
supported in the future. We can read files individually using the
read_ftir()
function, or (as we’re about to do) you can
read any number of files in a directory. Our data is stored in a
file_directory
variable, but you can provide your own
location (such as “C:/Users/[userid]/FTIR/Data” or wherever your files
are stored).
spectra <- read_ftir_directory(
path = file_directory,
files = c("toluene.csv", "heptanes.csv", "isopropanol.csv", "paper.csv", "polystyrene.csv"),
sample_names = c("toluene", "heptanes", "isopropanol", "paper", "polystyrene")
)
head(spectra)
#> wavenumber absorbance sample_id
#> 1 650.4205 0.011155 toluene
#> 2 652.2841 0.002671 toluene
#> 3 654.1478 0.024340 toluene
#> 4 656.0115 0.066932 toluene
#> 5 657.8751 0.065699 toluene
#> 6 659.7388 0.061379 toluene
Having loaded the data, we can see that it has a structure of three
columns, named wavenumber
, absorbance
and
sample_id
. We’ve read a few files in so all of the spectra
are included in the one spectra
object.
The loaded spectra can be plotted (with reasonable defaults) in just one line of code:
If plotting multiple samples (as we are here), you can offset the samples in the y axis to be able to more clearly see differences where they may have otherwise overlapped near the baseline:
We can modify this plot a number of ways:
The plot title and subtitle, and the legend title are all
customization. The location of the legend can be changed too! To add a
subtitle, pass a two-element string vector to the
plot_title
argument of plot_ftir()
.
plot_ftir(spectra, plot_title = c("Title of my Spectra Plot", "Plotted with PlotFTIR"), legend_title = "Samples:") |>
move_plot_legend(position = "bottom")
Once a plot has been produced, it’ easy to rename the samples in the
legend. This might be useful when moving from exploratory analysis to a
final output product, or if the sample names in the raw data are very
long and you have short-forms that you want to show on the plot. The
format for the renaming is "new_name"
=
"old_name"
. Not all sample_id
need to be
matched, samples without a rename given will retain the name they
currently have.
plot_ftir(spectra) |>
rename_plot_sample_ids(sample_ids = c("Methylbenzene" = "toluene", "C7" = "heptanes"))
The lower energy region of a FTIR plot may not be of much interest. We can compress those to both save graphical space and enhance the view of the more complex fingerprint region at higher energy.
You’ll see the usage of
|>
- this indicates that the output of this command is being passed to the first argument of the next command. More information on the pipe can be found in the valuable book R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. Alternatively you can capture the output of each line of code in a variable and pass that variable on to the next function called.
You can limit wavenumber range (zoom in) to any plot. If we want to look at the OH & CH stretch region, we can limit the spectra to that range.
You might notice the plot itself could be adjusted - a section later
discusses modifications using the ggplot2
functions that
underpin the PlotFTIR
package.
We can add markers at any specific wavenumber, to indicate
significant peaks or areas of interest. For example, the 1495 wavenumber
peak corresponds with a carbon-carbon bond (aromatic) vibration. This
marker’s label can be coloured or styled in any way, and the line can be
modified as well. See the ggplot2
documentation for
line_aesthetics
and label_aesthetics
.
plot_ftir(spectra) |>
add_wavenumber_marker(
wavenumber = 1495,
text = "C-C Aromatic",
line_aesthetics = list("linetype" = "dashed"),
label_aesthetics = list("color" = "#7e0021")
)
Note that any number of markers can be added, before or after other transformations of the spectral plot. The text of multiple markers may overlap.
All of the manipulations can be done in sequence to one image. This allows you to easily produce plots modified to your samples and annotated per your need.
plot_ftir(spectra, plot_title = c("My FTIR Plot", "Closeup of Detailed Region 1600 to 800 wavenumbers"), legend_title = "Samples:") |>
move_plot_legend(position = "bottom") |>
zoom_in_on_range(zoom_range = c(1600, 800)) |>
add_wavenumber_marker(wavenumber = 1495, text = "C-C Aromatic", line_aesthetics = c(color = "#7e0021", linetype = "dotted")) |>
add_wavenumber_marker(wavenumber = 817, text = "C-C-O\nsymmetric", line_aesthetics = c(color = "#ff420e", linetype = "dotted")) |>
add_wavenumber_marker(wavenumber = 1380, text = "CH3", line_aesthetics = c(linetype = "dashed")) |>
rename_plot_sample_ids(c("C7 Alkane" = "heptanes", "2-Propanol" = "isopropanol", "Toluene" = "toluene"))
Some instruments collect data in transmittance units, others in
absorbance units. It could also be required to plot FTIR spectra in some
specific units for a journal or report, or to compare to other existing
literature. Fortunately, the two are related by a simple mathematical
function. Two functions exist in this package to easily swap between the
two y-axis units. We’ll use absorbance_to_transmittance()
but the inverse transmittance_to_absorbance()
is available
as well.
It may be desired to shift a sample’s result on the plotted spectra
by a certain numerical amount. This can be done by adding or subtracting
a scalar
value - a single value of a set magnitude. We can
use the add_scalar_value()
or
subtract_scalar_value()
function to do that addition or
subtraction to the data prior to plotting. These functions can act on
all the spectra in the data set, or only on specific sample_ids as
provided.
shifted_spectra <- add_scalar_value(ftir = spectra, value = 0.2, sample_ids = c("heptanes", "toluene"))
plot_ftir(shifted_spectra)
Note that just the heptanes and toluene are shifted!
Sometimes you may have analyzed samples in replicate and wish to
average their results together. In this situation, it may be easier to
see an average of the spectra instead of all of those available. Our
sample spectra doesn’t include any replecate analysis, but this can be
performed by calling average_spectra()
. This returns a
data.frame
which includes only the resulting
average, even if you’ve only averaged a few of the samples (this is
important to note, because if we averaged "toluene"
and
"heptanes"
from the sample spectra set, the resulting
data.frame wouldn’t include "isopropanol"
,
"paper"
or "polystyrene"
spectra, so they
would be lost by this process).
It’s possible to adjust baselines by a few different mechanisms in
PlotFTIR.
The baseline can be adjusted at a single point
(i.e. set that point to 0 absorbance or 100 transmittance and shift the
rest of the spectra accordingly), at a minimum (or maximum for
transmittance) across a certain range, or the whole spectra, or by an
average of values across a range.
This can be done on a single sample in the data set, on each sample individually (where the amount of shift could be different for each sample), or the amount of baseline shift can be applied to all of the samples at once (i.e. where the same shift is applied to all of the samples).
All of these shifts are achieved by calling the
recalculate_baseline()
function.
For this demonstration, we’ll use the biodiesel
data
contained in the package. It’s a set of spectra of diesel samples
containing increasing amount of biodiesel.
If you look closely, all of the spectra are floating just above the 0
absorbance line. We can adjust those down then plot the same. We’ll
recalculate based on the average (method = "average"
)
across the wavenumber_range = c(2000, 1900)
, and move all
of the samples by their own calculated amount
(i.e. individually = TRUE
).
biodiesel |>
recalculate_baseline(method = "average", wavenumber_range = c(2000, 1900), individually = TRUE) |>
plot_ftir() |>
zoom_in_on_range(c(2000, 1000))
Of course, if you aren’t careful with baselining you can get some weird results. If we didn’t pay attention and tried to adjust by the point at 1250 cm-1 individually, we’d get a plot that’s not useful in most situations (even if it looks cool).
biodiesel |>
recalculate_baseline(method = "point", wavenumber_range = 1250, individually = TRUE) |>
plot_ftir() |>
zoom_in_on_range(c(2000, 1000))
Instead, it might be more useful to adjust all of the spectra by the minimum point in a range (the minimum of each sample’s minimum in the range). This puts that region close to zero, without removing what might be otherwise useful differences in absorbance.
biodiesel |>
recalculate_baseline(method = "minimum", wavenumber_range = c(1300, 1000), individually = FALSE) |>
plot_ftir() |>
zoom_in_on_range(c(2000, 1000))
The PlotFTIR
package contains convenient functions for
saving spectra plots to disk. This wraps the ggplot2
graphics package function, so much more specific information can be
found in
their documentation. Options for filetypes include “eps”, “ps”,
“tex” (pictex), “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” or “wmf” (on
windows only). More options as described in the documentation for
ggplot::ggsave()
above can be passed to the function to
tune your output file (options such as resolution, size, etc.).
De nombreux chimistes utilisent la spectroscopie infrarouge à transformée de Fourier (IRTF) pour analyser les produits chimiques et les matériaux dans leurs laboratoires. Ils peuvent tester la pureté, la composition des mélanges, les changements dans les groupes fonctionnels ou évaluer d’autres propriétés. Ces spectres sont parfois inclus dans des articles de journaux mais, jusqu’à présent, R ne dispose pas d’un outil pratique pour tracer les spectres.
Dans ce document, nous utiliserons les données intégrées dans le
package PlotFTIR
pour démontrer le chargement des données,
leur modification et le tracé des spectres résultants. Bien que le
paquetage ne dispose pas de fichiers de données brutes, nous simulerons
cette partie en lisant les données intégrées à partir du disque.
Pour lire les fichiers IRTF dans R, le package PlotFTIR
doit être installé et chargé. L’installation de la version stable du
package peut se faire à partir de CRAN.
Si vous souhaitez installer la version de développement de ce
package, vous pouvez le faire en utilisant le package
devtools
:
Une fois installé (par n’importe quelle méthode), le package doit être activé dans R avant utilisation.
Ce document suppose que vous disposez de fichiers de spectres IRTF au
format .csv, .txt (délimité par des virgules) ou .asp. D’autres types de
fichiers pourront être pris en charge à l’avenir. Nous pouvons lire les
fichiers individuellement en utilisant la fonction
read_ftir()
, ou (comme nous allons le faire) vous pouvez
lire n’importe quel nombre de fichiers dans un répertoire. Nos données
sont stockées dans une variable file_directory
, mais vous
pouvez fournir votre propre emplacement (comme
“C:/Users/[userid]/FTIR/Data” ou tout autre endroit où vos fichiers sont
stockés).
spectres <- read_ftir_directory(
path = file_directory,
files = c("toluene.csv", "heptanes.csv", "isopropanol.csv", "paper.csv", "polystyrene.csv"),
sample_names = c("toluene", "heptanes", "isopropanol", "paper", "polystyrene")
)
head(spectres)
#> wavenumber absorbance sample_id
#> 1 650.4205 0.011155 toluene
#> 2 652.2841 0.002671 toluene
#> 3 654.1478 0.024340 toluene
#> 4 656.0115 0.066932 toluene
#> 5 657.8751 0.065699 toluene
#> 6 659.7388 0.061379 toluene
Après avoir chargé les données, nous pouvons voir qu’elles ont une
structure de trois colonnes, nommées wavenumber
,
absorbance
et sample_id
. Nous avons lu
quelques fichiers, donc tous les spectres sont inclus dans un seul objet
spectres
.
Les spectres chargés peuvent être tracés (avec des valeurs par défaut
raisonnables) en une seule ligne de code (notez que l’ajout de
lang = 'fr'
produit un tracé avec des titres français):
Si vous tracez plusieurs échantillons (comme nous le faisons ici), vous pouvez décaler les échantillons sur l’axe des y pour pouvoir voir plus clairement les différences là où ils auraient pu se chevaucher près de la ligne de base:
Nous pouvons modifier ce tracé de plusieurs manières:
Le titre et le sous-titre du tracé, ainsi que le titre de la légende
sont tous configurables. L’emplacement de la légende peut également être
modifié ! Pour ajouter un sous-titre, passez un vecteur de texte à deux
éléments à l’argument plot_title
de
plot_ftir()
.
plot_ftir(spectres, lang = "fr", plot_title = c("Titre de mon tracé de spectres", "Tracé avec PlotFTIR"), legend_title = "Echantillons:") |>
move_plot_legend(position = "bottom")
Lorsque le tracé a été produit, il est facile de renommer les
échantillons dans la légende. Cela peut être utile pour passer d’une
analyse exploratoire à un produit final, ou si les noms des échantillons
dans les données sont très longs et que vous avez des formes abrégées
que vous voulez montrer sur le tracé. Le format de renommage est
"nouveau_nom" = "ancien_nom"
. Il n’est pas nécessaire de
faire correspondre tous les sample_id
, les échantillons qui
n’ont pas été renommés conserveront le nom qu’ils portent
actuellement.
plot_ftir(spectres, lang = "fr") |>
rename_plot_sample_ids(sample_ids = c("methylbenzene" = "toluene", "C7" = "heptanes"))
La région de basse énergie d’un tracé IRTF peut ne pas présenter beaucoup d’intérêt. Nous pouvons les compresser pour économiser de l’espace graphique et améliorer la visualisation de la région plus complexe de l’empreinte digitale à plus haute énergie.