# MINVAL - MINimal VALidation for Stoichiometric Reactions

### Introduction to MINVAL

The MINVAL package was designed as a tool to identify orphan metabolites and evaluate the mass and charge balance of stoichiometric reactions. The package also includes functions to characterize and write models in TSV and SBML formats, extract all reactants, products, metabolite names and compartments from a metabolic reconstruction. It is available through CRAN repositories, to install it just type:

install.packages("minval")

MINVAL package includes twelve functions designed to characterize, check and depurate metabolic reconstructions before its interrogation through Flux Balance Analysis (FBA). FBA methods are available for the R language in the "sybil" (Systems Biology Library) package. To load required packages just type:

library("minval")
#library("sybilSBML")
library("sybil")
library("glpkAPI")

### Metabolic Models

Metabolic models are sets of stoichiometric reactions that represents a process where a set of chemical compounds called reactants are converted into others called products. In a stoichiometric reaction, all the reactants are placed on the left and the products on the right separated by an arrow symbol which indicates the direction of the reaction. Some examples of valid stoichiometric reactions are:

"H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]"
"ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]"
"CO2[c] <=>"

MINVAL can extract both, reactants and products for a set of stoichiometric reactions through the reactants and products functions as follows:

• If the reaction is irreversible "=>" then reactants and products are separated and returned afterward.
reactants(reactionList = "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]")
products(reactionList = "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]")
## [1] "ATP[c]"      "Pyruvate[c]"
• If the reaction is reversible "<=>" then all reactants at some point can act as products and vice versa, for that reason both functions return all reaction metabolites.
reactants(reactionList = "H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]")
## [1] "H2O[c]"                "Urea-1-Carboxylate[c]" "CO2[c]"
## [4] "NH3[c]"
products(reactionList = "H2O[c] + Urea-1-Carboxylate[c] <=> 2 CO2[c] + 2 NH3[c]")
## [1] "H2O[c]"                "Urea-1-Carboxylate[c]" "CO2[c]"
## [4] "NH3[c]"

To show the potential use of the MINVAL package a human-readable model composed by a set of 19 stoichiometric reactions that represent an unbalanced model of the glycolysis process was included. To load it just type:

glycolysis <- read.csv(file = system.file("extdata", "glycolysisModel.csv",
package = "minval"),
stringsAsFactors = FALSE,
sep = '\t'
)
glycolysis$REACTION ## [1] "2-Phospho-D-glycerate[c] <=> Phosphoenolpyruvate[c] + H2O[c]" ## [2] "D-Glyceraldehyde 3-phosphate[c] <=> Glycerone phosphate[c]" ## [3] "D-Glyceraldehyde 3-phosphate[c] + Orthophosphate[c] + NAD+[c] <=> 3 3-Phospho-D-glyceroyl phosphate[c] + NADH[c] + H+[c]" ## [4] "beta-D-Fructose 1,6-bisphosphate[c] <=> Glycerone phosphate[c] + D-Glyceraldehyde 3-phosphate[c]" ## [5] "ATP[c] + 3-Phospho-D-glycerate[c] <=> ADP[c] + 3-Phospho-D-glyceroyl phosphate[c]" ## [6] "2-Phospho-D-glycerate[c] <=> 3-Phospho-D-glycerate[c]" ## [7] "ATP[c] + alpha-D-Glucose[c] => ADP[c] + alpha-D-Glucose 6-phosphate[c]" ## [8] "alpha-D-Glucose 6-phosphate[c] <=> beta-D-Fructose 6-phosphate[c]" ## [9] "ATP[c] + beta-D-Fructose 6-phosphate[c] => ADP[c] + beta-D-Fructose 1,6-bisphosphate[c]" ## [10] "ADP[c] + Phosphoenolpyruvate[c] => ATP[c] + Pyruvate[c]" ## [11] "H2O[c] <=>" ## [12] "NAD+[c] <=>" ## [13] "NADH[c] <=>" ## [14] "Orthophosphate[c] <=>" ## [15] "H+[c] <=>" ## [16] "ATP[c] <=>" ## [17] "alpha-D-Glucose[c] <=>" ## [18] "Pyruvate[c] <=>" ## [19] "ADP[c] <=>" Metabolic models include also another additional information related to the stoichiometric reactions. The generally associated information is: • ID: A list of single character strings containing the reaction abbreviations, Entries in the field abbreviation are used as reaction ids, so they must be unique. • REACTION: A set of stoichiometric reaction with the following characteristics: • Arrows symbols must be given in the form '=>' or '<=>' • Inverse arrow symbols '<=' or other types as: '-->', '<==>', '->' will not be parsed and will lead to errors. • Arrow symbols and plus signs (+) must be surrounded by a space character. • Stoichiometric coefficients must be surrounded by a space character and not by parentheses. • Each metabolite must have only one stoichiometric coefficient, substituents must be joined to metabolite name by a hyphen (-) symbol. • Exchange reactions must have only one metabolite before arrow symbol. • Compartments must be given between square brackets ([compartment]) joined at the end of metabolite name. • GPR: A set of genes joined by boolean operators as AND or OR, rules may be nested by parenthesis. (optional: column can be empty) • LOWER.BOUND: A list of numeric values containing the lower bounds of the reaction rates. If not set, zero is used for an irreversible reaction and -1000 for a reversible reaction. (optional: column can be empty) • UPPER.BOUND: list of numeric values containing the upper bounds of the reaction rates. If not set, 1000 is used by default. (optional: column can be empty) • OBJECTIVE: A list of numeric values containing objective values (0 or 1) for each reaction (optional: column can be empty) colnames(glycolysis) ## [1] "ID" "DESCRIPTION" "REACTION" "GPR" "LOWER.BOUND" ## [6] "UPPER.BOUND" "OBJECTIVE" ### SBML files The standard format to share and store biological processes such as metabolic models is the Systems Biology Markup Language (SBML) format. MINVAL package includes the "writeSBMLmod" function which is able to write models in SBML (level = 2, version = 1) format as follows: writeSBMLmod(modelData = glycolysis, modelID = "Glycolysis", outputFile = "glycolysis.xml" ) Metabolic models in SBML format can be readed through the readSBMLmod function of the sybilSBML R package. # glycoModel <- sybilSBML::readSBMLmod("glycolysis.xml") # glycoModel After load the metabolic model, it can be interrogated through FBA using the optimizeProb function of the sybil R package. In this case, the reaction R00200 was set as the objective function. The R00200 reaction describes the production of pyruvate from phosphoenolpyruvate an alpha-D-Glucose derivate. Glycolysis pathway can be summarized as: 1 alpha-D-Glucose[c] + 2 NAD+[c] + 2 ADP[c] + 2 Orthophosphate[c] => 2 Pyruvate[c] + 2 NADH[c] + 2 H+[c] + 2 ATP[c] + 2 H2O[c] where the metabolism of one molecule of alpha-D-Glucose yield two molecules of pyruvate. As is shown below, interrogated glycolysis model estimates a production of six molecules of pyruvate by each alpha-D-Glucose molecule due a mass unbalance. FBA methods are sensitive to thermodynamic (mass-charge) unbalance, so in order to achieve a valid biological extrapolation is mandatory to avoid this type of unbalancing in all model reactions. # sybil::optimizeProb(glycoModel) ### Syntax validation The first step for a stoichiometric reactions validation is to check their syntax. Valid stoichiometric reactions must have the following mandatory characteristics: • Arrows symbols must be given in the form '=>' or '<=>' (Inverse arrow symbols '<=' or other types as: '-->', '<==>', '->' will not be parsed and will lead to errors.) • Arrow symbols and plus signs (+) must be surrounded by a space character. • Stoichiometric coefficients must be surrounded by a space character and not by parentheses. • Each metabolite must have only one stoichiometric coefficient, substituents must be joined to metabolite name by a hyphen (-) symbol. • Exchange reactions must have only one metabolite before arrow symbol. • Compartments must be given between square brackets ([compartment]) joined at the end of metabolite name. Syntax validity can be checked through MINVAL package using the validateSyntax function which returns a boolean TRUE value if the stoichiometric reaction passes all validations. To validate reactions syntax just type: validateSyntax(reactionList = glycolysis$REACTION)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE

### Mass - Charge Balance Validation

Another step for a stoichiometric reactions validation is to check their mass-charge balance. This process requires the use of a reference with chemical formulas, molecular weights and/or net charges for each metabolite included in the metabolic model.

MINVAL package includes the downloadChEBI function that allows the download of different releases of the Chemical Entities of Biological Interest (ChEBI) database, a freely available dictionary of molecular entities focused on ‘small’ chemical compounds involved in biochemical reactions. To download the latest version (this process takes aprox. 2 min with a 5 Mb internet connection) of the ChEBI database just type:

ChEBI <- downloadChEBI(release = "latest",
woAssociations = TRUE
)

Reference also can be user provided. An example of a reference for the glycolysis model is shown below.

chemicalData <- read.csv2(file = system.file("extdata", "chemData.csv",
package = "minval")
)
##                              NAME       FORMULA     MASS CHARGE
## 1                             H2O           H2O  18.0106      0
## 2                              H+             H   1.0078      1
## 3                             ATP C10H16N5O13P3 506.9957      0
## 4                            NAD+ C21H28N7O14P2 664.1169      1
## 5 3-Phospho-D-glyceroyl phosphate     C3H8O10P2 265.9593      0
## 6     beta-D-Fructose 6-phosphate      C6H13O9P 260.0297      0

#### Mass Balance Validation

In a balanced stoichiometric reaction according to the Lomonosov-Lavoisier law, the mass comprising the reactants should be the same mass present in the products. If the chemical formula is given, the checkBalance function multiplies the atom numbers by their respective stoichiometric coefficient and establishes if the atomic composition of reactants and products are the same. If the molecular weight is given then the sum of masses of reactants and products are compared. In both cases, a boolean TRUE value is returned if the mass is balanced.

Mass balance can be tested using the chemical formula or the molecular weight associated to each metabolite as follows:

checkBalance(reactionList = glycolysis$REACTION, referenceData = chemicalData, ids = "NAME", mFormula = "FORMULA" ) ## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE As is shown above, the third stoichiometric reaction is unbalanced. It can be corrected replacing manually the unbalanced reaction as follows: glycolysis$REACTION[3] <- "D-Glyceraldehyde 3-phosphate[c] + Orthophosphate[c] + NAD+[c] <=> 3-Phospho-D-glyceroyl phosphate[c] + NADH[c] + H+[c]"

checkBalance(reactionList = glycolysis$REACTION, referenceData = chemicalData, ids = "NAME", mWeight = "MASS" ) ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [15] TRUE TRUE TRUE TRUE TRUE When all stoichiometric reactions are mass-balanced, then the model can be exported and loaded to be interrogated again: writeSBMLmod(modelData = glycolysis, modelID = "GlycolysisBalanced", outputFile = "glycolysisBalanced.xml" ) # sybil::optimizeProb(sybilSBML::readSBMLmod("glycolysisBalanced.xml")) #### Charge Balance Validation Charge balance can be also tested through the checkBalance function using the net charge of metabolites as follows: checkBalance(reactionList = glycolysis$REACTION,
referenceData = chemicalData,
ids = "NAME",
mCharge = "CHARGE"
)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE

### Characterize model

#### Stoichiometric matrix

A metabolic model is often represented in a more compact form called the stoichiometry matrix (S). If a metabolic model has n reactions and m participating metabolites, then the stoichiometry matrix will have correspondingly m rows and n columns. Values in the stoichiometric matrix represent the metabolites coefficients in each reaction. To generate the stoichiometric matrix just type:

stoichiometricMatrix(reactionList = glycolysis$REACTION) ## reactions ## metabolites R01 R02 R03 R04 R05 R06 R07 R08 R09 ## 2-Phospho-D-glycerate[c] -1 0 0 0 0 -1 0 0 0 ## Phosphoenolpyruvate[c] 1 0 0 0 0 0 0 0 0 ## H2O[c] 1 0 0 0 0 0 0 0 0 ## D-Glyceraldehyde 3-phosphate[c] 0 -1 -1 1 0 0 0 0 0 ## Glycerone phosphate[c] 0 1 0 1 0 0 0 0 0 ## Orthophosphate[c] 0 0 -1 0 0 0 0 0 0 ## NAD+[c] 0 0 -1 0 0 0 0 0 0 ## 3-Phospho-D-glyceroyl phosphate[c] 0 0 1 0 1 0 0 0 0 ## NADH[c] 0 0 1 0 0 0 0 0 0 ## H+[c] 0 0 1 0 0 0 0 0 0 ## beta-D-Fructose 1,6-bisphosphate[c] 0 0 0 -1 0 0 0 0 1 ## ATP[c] 0 0 0 0 -1 0 -1 0 -1 ## 3-Phospho-D-glycerate[c] 0 0 0 0 -1 1 0 0 0 ## ADP[c] 0 0 0 0 1 0 1 0 1 ## alpha-D-Glucose[c] 0 0 0 0 0 0 -1 0 0 ## alpha-D-Glucose 6-phosphate[c] 0 0 0 0 0 0 1 -1 0 ## beta-D-Fructose 6-phosphate[c] 0 0 0 0 0 0 0 1 -1 ## Pyruvate[c] 0 0 0 0 0 0 0 0 0 ## reactions ## metabolites R10 R11 R12 R13 R14 R15 R16 R17 R18 ## 2-Phospho-D-glycerate[c] 0 0 0 0 0 0 0 0 0 ## Phosphoenolpyruvate[c] -1 0 0 0 0 0 0 0 0 ## H2O[c] 0 -1 0 0 0 0 0 0 0 ## D-Glyceraldehyde 3-phosphate[c] 0 0 0 0 0 0 0 0 0 ## Glycerone phosphate[c] 0 0 0 0 0 0 0 0 0 ## Orthophosphate[c] 0 0 0 0 -1 0 0 0 0 ## NAD+[c] 0 0 -1 0 0 0 0 0 0 ## 3-Phospho-D-glyceroyl phosphate[c] 0 0 0 0 0 0 0 0 0 ## NADH[c] 0 0 0 -1 0 0 0 0 0 ## H+[c] 0 0 0 0 0 -1 0 0 0 ## beta-D-Fructose 1,6-bisphosphate[c] 0 0 0 0 0 0 0 0 0 ## ATP[c] 1 0 0 0 0 0 -1 0 0 ## 3-Phospho-D-glycerate[c] 0 0 0 0 0 0 0 0 0 ## ADP[c] -1 0 0 0 0 0 0 0 0 ## alpha-D-Glucose[c] 0 0 0 0 0 0 0 -1 0 ## alpha-D-Glucose 6-phosphate[c] 0 0 0 0 0 0 0 0 0 ## beta-D-Fructose 6-phosphate[c] 0 0 0 0 0 0 0 0 0 ## Pyruvate[c] 1 0 0 0 0 0 0 0 -1 ## reactions ## metabolites R19 ## 2-Phospho-D-glycerate[c] 0 ## Phosphoenolpyruvate[c] 0 ## H2O[c] 0 ## D-Glyceraldehyde 3-phosphate[c] 0 ## Glycerone phosphate[c] 0 ## Orthophosphate[c] 0 ## NAD+[c] 0 ## 3-Phospho-D-glyceroyl phosphate[c] 0 ## NADH[c] 0 ## H+[c] 0 ## beta-D-Fructose 1,6-bisphosphate[c] 0 ## ATP[c] 0 ## 3-Phospho-D-glycerate[c] 0 ## ADP[c] -1 ## alpha-D-Glucose[c] 0 ## alpha-D-Glucose 6-phosphate[c] 0 ## beta-D-Fructose 6-phosphate[c] 0 ## Pyruvate[c] 0 #### Metabolites The metabolites function automatically identifies and return all metabolites (with or without compartments) for a specific or a set of stoichiometric reactions. Some FBA implementations require the list of all metabolites included in the metabolic reconstruction as an independent section of the human-readable input model. metabolites(reactionList = glycolysis$REACTION)
##  [1] "2-Phospho-D-glycerate[c]"
##  [2] "Phosphoenolpyruvate[c]"
##  [3] "H2O[c]"
##  [4] "D-Glyceraldehyde 3-phosphate[c]"
##  [5] "Glycerone phosphate[c]"
##  [6] "Orthophosphate[c]"
##  [8] "3-Phospho-D-glyceroyl phosphate[c]"
## [10] "H+[c]"
## [11] "beta-D-Fructose 1,6-bisphosphate[c]"
## [12] "ATP[c]"
## [13] "3-Phospho-D-glycerate[c]"
## [15] "alpha-D-Glucose[c]"
## [16] "alpha-D-Glucose 6-phosphate[c]"
## [17] "beta-D-Fructose 6-phosphate[c]"
## [18] "Pyruvate[c]"
metabolites(reactionList = glycolysis$REACTION, woCompartment = TRUE) ## [1] "2-Phospho-D-glycerate" "Phosphoenolpyruvate" ## [3] "H2O" "D-Glyceraldehyde 3-phosphate" ## [5] "Glycerone phosphate" "Orthophosphate" ## [7] "NAD+" "3-Phospho-D-glyceroyl phosphate" ## [9] "NADH" "H+" ## [11] "beta-D-Fructose 1,6-bisphosphate" "ATP" ## [13] "3-Phospho-D-glycerate" "ADP" ## [15] "alpha-D-Glucose" "alpha-D-Glucose 6-phosphate" ## [17] "beta-D-Fructose 6-phosphate" "Pyruvate" #### Compartments As well as in cells, in which not all reactions occur in all compartments, stoichiometric reactions in a metabolic reconstruction can be labeled to be restricted to a single compartment during FBA through the assignment of a compartment label at the end of each metabolite name. Some FBA implementations require the list of all compartments included in the metabolic reconstruction as an independent section of the human-readable input file. compartments(reactionList = glycolysis$REACTION)
## [1] "c"

#### OrphanMetabolites

orphanMetabolites or compounds that are not produced or consumed in any other reaction are one of the main causes of mass accumulation in metabolic reconstructions. The orphanReactants function identifies compounds that are not produced internally by any other reaction and should be added to the reconstruction, for instance, as an exchange reaction while the orphanProducts function identifies compounds that are not consumed internally by any other reaction and should be added to the reconstruction as a sink reaction.

orphanMetabolites(reactionList = glycolysis$REACTION[1:10]) ## [1] "alpha-D-Glucose[c]" "Pyruvate[c]" "H+[c]" ## [4] "H2O[c]" "NAD+[c]" "NADH[c]" ## [7] "Orthophosphate[c]" orphanReactants(reactionList = glycolysis$REACTION[1:10])
## [1] "alpha-D-Glucose[c]" "H+[c]"              "H2O[c]"
orphanProducts(reactionList = glycolysis\$REACTION[1:10])
## [1] "Pyruvate[c]"       "H+[c]"             "H2O[c]"
## [4] "NAD+[c]"           "NADH[c]"           "Orthophosphate[c]"

### TSV files

The function writeTSVmod writes a metabolic model in three text files, following a character-separated value format. Each line contains one entry; the default value separator is a tab. TSV models are the default input format for the sybil R package.

writeTSVmod(modelData = glycolysis,
modelID = "Glycolysis",
outputFile = "glycolysis"
)

## objective function:     +1 R00200