With ‘cito’ you can build and train fully-connected neural networks using the formula syntax. It is based on the ‘torch’ deep learning framework. Since ‘torch’ is native to R, no Python installation or API is needed.
Before using ‘cito’ make sure that the current version of ‘torch’ is installed and running.
if(!require(torch)) install.packages("torch")
library(torch)
if(!torch_is_installed()) install_torch()
library (cito)
In this vignette, we will work with the irirs dataset and build a regression model.
<- datasets::iris
data head(data)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#scale dataset
<- data.frame(scale(data[,-5]),Species = data[,5]) data
In ‘cito’, you can fit neural networks with the dnn function. Models can also be trained on the GPU by setting device = “cuda”. This is suggested if you are working with large data sets or networks.
library(cito)
#fitting a regression model to predict Sepal.Length
<- dnn(Sepal.Length~. , data = data, hidden = c(10,10,10,10), epochs = 12, device = "cpu") nn.fit
#> Loss at epoch 1: 0.906718, lr: 0.01000
#> Loss at epoch 2: 0.863654, lr: 0.01000
#> Loss at epoch 3: 0.843066, lr: 0.01000
#> Loss at epoch 4: 0.825574, lr: 0.01000
#>
#> ....
#>
#> Loss at epoch 11: 0.408130, lr: 0.01000
#> Loss at epoch 12: 0.403822, lr: 0.01000
You can plot the network structure to give you a visual feedback of the created object. Be aware that this may take some time for large networks.
plot(nn.fit)
Structure plot of the generate network
By default, all layers are fitted with ReLU as activation function. \[ relu(x) = max (0,x) \] You can also adjust the activation function of each layer individually to build exactly the network you want. In this case you have to provide a vector the same length as there are hidden layers. The activation function of the output layer is chosen with the loss argument and does not have to be provided.
#selu as activation function for all layers:
<- dnn(Sepal.Length~., data = data, hidden = c(10,10,10,10), activation= "selu")
nn.fit #layer specific activation functions:
<- dnn(Sepal.Length~., data = data,
nn.fit hidden = c(10,10,10,10), activation= c("relu","selu","tanh","sigmoid"))
In order to see where your model might start overfitting the addition of a validation set can be useful. With dnn() you can put validation = 0.x and define a percentage that will not be used for training and only for validation after each epoch. During training, the loss plot will show you how the two losses behave.
#20% of data set is used as validation set
<- dnn(Sepal.Length~., data = data, epochs = 32,
nn.fit loss= "mae", hidden = c(10,10,10,10), validation = 0.2)
#> Loss at epoch 1: training: 5.868, validation: 5.621, lr: 0.01000
#> Loss at epoch 2: training: 5.464, validation: 4.970, lr: 0.01000
#> Loss at epoch 3: training: 4.471, validation: 3.430, lr: 0.01000
#> Loss at epoch 4: training: 2.220, validation: 0.665, lr: 0.01000
#>
#> ...
#>
#>
#> Loss at epoch 31: training: 0.267, validation: 0.277, lr: 0.01000
#> Loss at epoch 32: training: 0.265, validation: 0.275, lr: 0.01000
Training loss
We can choose the model with minimal validation loss with the nn.fit$use_model_epoch argument.
$use_model_epoch <- which.min(nn.fit$losses$valid_l) nn.fit
The standard generic functions can be used to interpret the model:
#utilize model on new data
predict(nn.fit,data[1:3,])
#> [,1]
#> [1,] 5.046695
#> [2,] 4.694821
#> [3,] 4.788142
#returns weights of neural network
coef(nn.fit)
#> [[1]]
#> [[1]]$`0.weight`
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.21469544 0.17144544 0.06233330 0.05737647 -0.56643492 0.30539653
#> [2,] 0.02309913 0.32601142 -0.04106455 -0.05251846 0.06156364 -0.16903549
#> [3,] 0.02779424 -0.39305094 0.22446594 -0.11260942 0.40277928 -0.14661779
#> [4,] -0.17715086 -0.34669805 0.41711944 -0.07970788 0.28087401 -0.32001352
#> [5,] 0.10428729 0.46002910 0.12490098 -0.25849682 -0.49987957 -0.19863304
#> [6,] 0.08653354 0.02208819 -0.18835779 -0.18991815 -0.19675359 -0.37495106
#> [7,] 0.28858119 0.02029459 -0.40138969 -0.39148667 -0.29556298 0.17978610
#> [8,] 0.34569272 -0.04052169 0.76198137 0.31320053 -0.06051779 0.34071702
#> [9,] 0.34511277 -0.42506409 -0.50092584 -0.22993636 0.05683114 0.38136607
#> [10,] -0.13597916 0.25648212 -0.08427665 -0.46611786 0.14236088 0.04671739
#>
#> ...
#>
#> [[1]]$`8.bias`
#> [1] 0.2862495
With summary(), the feature Importance based on Fisher, Rudin, and Dominici (2018) gets calculated for all variables.
# Calculate and return feature importance
summary(nn.fit)
#> Deep Neural Network Model summary
#> Feature Importance:
#> variable importance
#> 1 Sepal.Width 3.373757
#> 2 Petal.Length 3.090394
#> 3 Petal.Width 2.992742
#> 4 Species 3.278064
If elastic net is used, ‘cito’ will produce a sparse, generalized neural network. The L1/L2 loss can be controlled with the arguments alpha and lambda.
\[ loss = \lambda * [ (1 - \alpha) * |weights| + \alpha |weights|^2 ] \]
If a single alpha value is provided each layer will get regularized the same. You can also regularize each layer individually by providing a vector of alpha values the same length as there are hidden layers + 1, the +1 comes from the input layer. With NA you can turn off generalization for specific layers.
#elastic net penalty in all layers:
<- dnn(Species~., data = data, hidden = c(10,10,10,10), alpha = 0.5, lambda = 0.01)
nn.fit #L1 generalization in the first layer no penalty on the other layers:
<- dnn(Species~., data = data, hidden = c(10,10,10,10),
nn.fit alpha = c(0,NA,NA,NA,NA), lambda = 0.01)
Dropout also regularizes your network. During training in every epoch a defined percentage of nodes gets left out.
#dropout of 35% on all layers:
<- dnn(Species~., data = data, hidden = c(10,10,10,10), dropout = 0.35)
nn.fit #dropout of 35% only on last 2 layers:
<- dnn(Species~., data = data,
nn.fit hidden = c(10,10,10,10), dropout = c(0, 0, 0.35, 0.35))
Learning rate scheduler allow you to start with a high learning rate and decrease it during the training process. You can choose between different types of schedulers. Namely, lambda, multiplicative, one_cycle and step.
The function config_lr_scheduler() helps you setup such a scheduler. See ?config_lr_scheduler() for more information
# Step Learning rate scheduler that reduces learning rate every 16 steps by a factor of 0.5
<- config_lr_scheduler(type = "step",
scheduler step_size = 16,
0.5)
<- dnn(Sepal.Length~., data = data,lr = 0.01, lr_scheduler= scheduler) nn.fit
Optimizer are responsible for fitting the neural network. The
optimizer tries to minimize the loss function. As default the stochastic
gradient descent is used. Custom optimizers can be used with
config_optimizer().
See ?config_optimizer() for more information.
# adam optimizer with learning rate 0.002, betas to 0.95, 0.999 and eps to 1.5e-08
<- config_optimizer(
opt type = "adam",
betas = c(0.95, 0.999),
eps = 1.5e-08)
<- dnn(Species~., data = data, optimizer = opt, lr=0.002) nn.fit
Loss function measure how good the network performs. Standard Loss functions are implemented along with some probability distributions.
# Real Mean squared error
<- dnn(Sepal.Length~. data = data, loss = "rmse")
nn.fit
# normal distribution
<- dnn(Sepal.Length~. data = data, loss = stats::gaussian()) nn.fit
Adding early stopping criteria helps you save time by stopping the training process early, if the validation loss of the current epoch is bigger than the validation loss n epochs early. The n can be defined by the early_stopping argument. It is required to set validation > 0.
# Stops training if validation loss at current epoch is bigger than that 15 epochs earlier
<- dnn(Sepal.Length~., data = data, epochs = 1000,
nn.fit validation = 0.2, early_stopping = 15)
You can continue the training process of an existing model with continue_training().
# simple example, simply adding another 12 epochs to the training process
<- continue_training(nn.fit, epochs = 12) nn.fit
It also allows you to change any training parameters, for example the learning rate. You can also define which epoch the training should continue from. You can analyze the training process with analyze_training() and pick an epoch from which on the training should be continued from.
# picking the model with the smalles validation loss
# with changed parameters, in this case a smaller learning rate and a smaller batchsize
<- continue_training(nn.fit,
nn.fit continue_from = which.min(nn.fit$losses$valid_l),
epochs = 32,
changed_params = list(lr = 0.001, batchsize = 16))