The purpose of this documentation is to understand how the Normalization & SAX indexing step works. The goal of this step is prepare the dataset to the process.

Description

Normalization

This step is fundamental to ensure that the data is in the same scale/basis. To do this normalization the method Z-score is used.

Symbolic Aggregation ApproXimation (SAX)

The observations of subsequences trends to be normally distributed. Thereby, the discretization space is made over the Gaussian curve in different intervals with the same probability. To encode values, we must give a number of letters in the alphabet.

SAX Encoding with 3 letters

Example

head(STMotif::example_dataset[,1:10])
#>        1    2    3     4     5     6    7     8    9   10
#> 360  737 1350  869   750  1138   758 1006  1095   99  -83
#> 361  283  565  504   317  1849   944  -80  -895 -936  906
#> 362 -118 -375 -564  -803   870   472 -922 -1009 -698  741
#> 363 -696 -844 -654 -1303  -474  -591 -262  1034 1012  376
#> 364 -251 -622  -14  -587 -1108 -1401  404  1545 1696  247
#> 365  645  -10   -4   411  -858 -1261 -574  -329 -367 -680
head(NormSAX(D = STMotif::example_dataset, a = 7)[,1:10])
#>   1 2 3 4 5 6 7 8 9 10
#> 1 e f e e f e f f d  d
#> 2 d e e d f f d b b  e
#> 3 d c c c e e b b c  e
#> 4 c b c b c c c f f  e
#> 5 c c d c b b e f f  d
#> 6 e d d e b b c c c  c