Working with Tuning Parameters

Tuning Parameters

Some statistical and machine learning models contain tuning parameters (also known as hyperparameters), which are parameters that cannot be directly estimated by the model. An example would be the number of neighbors in a K-nearest neighbors model. To determine reasonable values of these elements, some indirect method is used such as resampling or profile likelihood. Search methods, such as genetic algorithms or Bayesian search can also be used to determine good values.

In any case, some information is needed to create a grid or to validate whether a candidate value is appropriate (i.e. neighbors should be a positive integer). dials is designed to:

The main type of objects in dials have class param.

param Objects

param objects contain information about possible values, ranges, types, and other aspects. There are two main subclasses related to the type of variable. Double and integer valued data have the subclass “quant_param” while character and logicals have “qual_param.” There are some common elements for each:

Otherwise, the information contained in param objects are different for different data types.

Numeric Parameters

An example of a numeric tuning parameter is the cost-complexity parameter of CART trees, otherwise known as \(C_p\). The parameter object in dials is:

Note that this parameter is handled in log units and the default range of values is between 10^-10 and 0.1. The range of possible values can be returned and changed based on some utility functions. We’ll use the pipe operator here:

Values for this parameter can be obtained in a few different ways. To get a sequence of values that span the range:

Random values can be sampled too. A random uniform distribution is used (between the range values). Since this parameter has a transformation associated with it, the values are simulated in the transformed scale and then returned in the natural units (although another original argument can be used here):

For CART trees, there is a discrete set of values that exist for a given data set. It may be a good idea to assign these possible values to the object. We can get them by fitting an initial rpart model and then adding the values to the object. For mtcars, there are only three values:

The error occurs because the values are not in the transformed scale:

Now, if sequence or random sample is requested, it uses the set values:

Discrete Parameters

In this case there is no notion or a range scale. The parameter objects are defined by their values. For example, consider a parameter for the types of kernel functions that is used with distance functions:

The helper functions are analogues to the quantitative parameters:

Creating Novel Parameters

The package contains two constructors that can be used to create new quantitative and qualitative parameters. This file contains the code to create the parameters contained in the package.

Unknown Values

There are some cases where the range of parameter values are data dependent. For example, the upper bound on the number of neighbors cannot be known if the number of data points in the training set is not known. For that reason, some parameters have unknown placeholder:

#> # Randomly Selected Predictors  (quantitative)
#> Range: [1, ?]
#> # Nearest Neighbors  (quantitative)
#> Range: [1, ?]
#> Minimal Node Size  (quantitative)
#> Range: [2, ?]
#> # Observations Sampled  (quantitative)
#> Range: [?, ?]
#> # Model Terms  (quantitative)
#> Range: [1, ?]
#> # Components  (quantitative)
#> Range: [1, ?]
# and so on

These values must be initialized prior to generating parameter values.

Parameter Grids

Sets or combinations or parameters can be created for use in grid search. grid_regular and grid_random take any number of param objects.

For example, for a glmnet model, a regular grid might be:

  levels = 3 # or c(3, 4), etc
#> # A tibble: 9 x 2
#>   mixture      penalty
#>     <dbl>        <dbl>
#> 1     0   0.0000000001
#> 2     0.5 0.0000000001
#> 3     1   0.0000000001
#> 4     0   0.00001     
#> 5     0.5 0.00001     
#> 6     1   0.00001     
#> 7     0   1           
#> 8     0.5 1           
#> 9     1   1

and, similarly, a random grid is created using

  size = 6 
#> # A tibble: 6 x 2
#>   mixture     penalty
#>     <dbl>       <dbl>
#> 1   0.200 0.0176     
#> 2   0.750 0.000388   
#> 3   0.191 0.000000159
#> 4   0.929 0.00000176 
#> 5   0.143 0.0442     
#> 6   0.973 0.0110