*IndexNumR* is a package for computing indices of aggregate prices or quantities using information on the prices and quantities on multiple products over multiple time periods. Such numbers are routinely computed by statistical agencies to measure, for example, the general change in the level of prices, production inputs and productivity for an economy. Well known examples are consumer price indices and producer price indices.

In recent years, advances have been made in index number theory to address biases in many well known and widely used index number methods. One potential solution to some of these problems is the adaptation of multilateral methods that have commonly been used in cross-sectional comparisons to the time series context. This involves more computational complexity than earlier methods, but generally requires similar inputs. *IndexNumR* provides functions that make it easy to estimate indices using common index number methods, as well as the GEKS multilateral method.

This first section covers the inputs into the main index number functions and how the data are to be organised to use these functions.

The index number functions `priceIndex`

, `quantityIndex`

and `GEKSIndex`

all take a dataframe as their first argument. This dataframe should contain everything needed to compute the index. In general this includes columns for,

- prices
- quantities
- a time period variable (more on this below)
- a product identifier that uniquely identifies each product.

The dataframe must have column names, since character strings are used in other arguments to the index number functions to specify which columns contain the data listed above. Column names can be set with the `colnames`

function of base R. The dataset CES_sigma_2 is an example of the minimum dataframe required to compute an index.

```
## time prices quantities prodID
## 1 1 2.00 0.3846154 1
## 2 2 1.75 0.5846626 1
## 3 3 1.60 0.7135502 1
## 4 4 1.50 0.9149417 1
## 5 5 1.45 1.0280574 1
## 6 6 1.40 1.2058234 1
```

In this case, the dataframe is sorted by the product identifier *prodID*, but it need not be sorted at all.

To be able to compute indices, the data need to be subset in order to extract all observations on products for given periods. The approach used in *IndexNumR* is to require a time period variable as an input into many of its functions that will be used for subsetting. This time period variable must satisfy the following,

- start at 1
- increase in integer increments of 1
- continuous (that is, no gaps).

The variable may, and in fact likely will, have many observations for a given time period, since there are generally multiple items with price and quantity information. For example, the CES_sigma_2 dataset has observations on 4 products for each time period. We can see this by observing the first few rows of the dataset sorted by the time period.

```
## time prices quantities prodID
## 1 1 2.00 0.3846154 1
## 13 1 1.00 1.5384615 2
## 25 1 1.00 1.5384615 3
## 37 1 0.50 12.3076923 4
## 2 2 1.75 0.5846626 1
## 14 2 0.50 7.1621164 2
```

The user can provide their own time variable, or if a date variable is available, *IndexNumR* has four functions that can compute the required time variable: `yearIndex`

, `quarterIndex`

, `monthIndex`

and `weekIndex`

. Users should be aware that if there are a very large number of observations then these functions can take some time to compute, but once it has been computed it is easier and faster to work with than dates.

A related issue is that of aggregating data collected at some higher frequency, to a lower frequency. When computing index numbers, this is often done by computing a *unit value* as follows, \[\begin{equation}
UV_{t} = \frac{\sum_{i=1}^{N}p^{t}_{n}q^{t}_{n}}{\sum_{i=1}^{N}q^{t}_{n}}
\end{equation}\] That is, sum up total expenditure on each item over the required period, and divide by the total quantity. Provided that a time period variable as described above is available, the unit values can be computed using the function `unitValues`

. This function returns the unit values, along with the aggregate quantities for each time period and each product. The output will also include the product identifier and time period variable so the output dataframe from the `unitvalues`

function contains everything needed to compute an index number.

A common issue when computing index numbers is that the sample of products over which the index is computed changes over time. Since price and quantity information is generally needed on the same set of products for each pair of periods being compared, the index calculation functions provided in *IndexNumR* provide the option `sample="matched"`

to use only a matched sample of products. This means that the price and quantity information will be extracted for a pair of periods, any non-overlapping products removed, and the index computed over these matched products. This is repeated for each pair of periods over which the index is being computed.

Matched-sample indices may suffer from bias. As a simple assessment of the potential bias, the function `evaluateMatched`

calculates the proportion of total expenditure that the matched sample covers in each time period. The function provides output for expenditure as well as counts and can evaluate overlap using either a chained or fixed base index.

The first four columns of the output presents the base period information base_index (the time index of the base period), base (total base period expenditure or count), base_matched (the expenditure or count of the base period for matched products), base_share (share of total expenditure in the base period that remains after matching). Columns 5-8 report the same information for the current period. Columns 4 and 8 can be expressed as, \[\begin{equation} \lambda_{t} = \frac{\sum_{I\in I(1)\cap I(0)}p_{n}^{t}q_{n}^{t}}{\sum_{I\in I(t)}p_{n}^{t}q_{n}^{t}} \quad \text{for } t \in \{1,0\}, \end{equation}\] where \(I(t)\) is the set of products available in period \(t\), \(t=1\) refers to the current period as is used to compute column 8 and \(t=0\) refers to the comparison period, which is used to compute column 4.

Bilateral index numbers are those that use data for two periods to compute each value of the index. All of the bilateral index numbers can be computed as period-on-period, chained or fixed base. Period-on-period simply measures the change from one period to the next. Chained indices give the cumulative change, and it is calculated as the cumulative product of the period-on-period index. The fixed base index compares each period to the base period. This is also called a direct index, because unlike a chained index, it does not go through other periods to measure the change since the base period. Formulae used to compute the bilateral index numbers from period t-1 to period t are as below.

Carli index (Carli 1804), \[\begin{equation*} P(p^{t-1},p^{t}) = \frac{1}{N}\sum_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right) \end{equation*}\]

Jevons index (Jevons 1865), \[\begin{equation*} P(p^{t-1},p^{t}) = \prod_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{(1/N)} \end{equation*}\]

Dutot index (Dutot 1738), \[\begin{equation*} P(p^{t-1},p^{t}) = \frac{\sum_{n=1}^{N}p^{t}_{n}}{\sum_{n=1}^{N}p^{t-1}_{n}} \end{equation*}\]

Laspeyres index (Laspeyres 1871), \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1}) = \frac{\sum_{n=1}^{N}p^{t}_{n}q^{t-1}_{n}}{\sum_{n=1}^{N}p^{t-1}_{n}q^{t-1}_{n}} \end{equation*}\]

Paasche index (Paasche 1874), \[\begin{equation*} P(p^{t-1},p^{t},q^{t}) = \frac{\sum_{n=1}^{N}p^{t}_{n}q^{t}_{n}}{\sum_{n=1}^{N}p^{t-1}_{n}q^{t}_{n}} \end{equation*}\]

Geometric Laspeyres index \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1}) = \prod_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{s^{t-1}_{n}}, \end{equation*}\] where \(s^{t}_{n} = \frac{p^{t}_{n}q^{t}_{n}}{\sum_{n=1}^{N}p^{t}_{n}q^{t}_{n}}\) is the share of period \(t\) expenditure on good \(n\).

Geometric Paasche index \[\begin{equation*} P(p^{t-1},p^{t},q^{t}) = \prod_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{s^{t}_{n}}, \end{equation*}\] where \(s^{t}_{n}\) is defined as above for the geometric laspeyres index.

Fisher index (Fisher 1921), \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1},q^{t}) = [P_{P}P_{L}]^{\frac{1}{2}}, \end{equation*}\] where \(P_{P}\) is the Paasche index and \(P_{L}\) is the Laspeyres index. The Fisher index has other representations, but this is the one used by

*IndexNumR*in its computations.Tornqvist index (Törnqvist 1936; Törnqvist and Törnqvist 1937), \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1},q^{t}) = \prod_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{\left(s^{t-1}_{n}+s^{t}_{n}\right)/2}, \end{equation*}\] where \(s^{t}_{n}\) is defined as above for the geometric laspeyres index.

Walsh index, \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1},q^{t}) = \frac{\sum_{n=1}^{N}\sqrt{q^{t-1}_{n}q^{t}_{n}}\cdot p^{t}_{n}}{\sum_{n=1}^{N}\sqrt{q^{t-1}_{n}q^{t}_{n}}\cdot p^{t-1}_{n}} \end{equation*}\]

Sato-Vartia index (Sato 1976; Vartia 1976), \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1},q^{t}) = \prod_{n=1}^{N}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{w_{n}} \end{equation*}\] where the weights are normalised to sum to one, \[\begin{equation*} w_{n} = \frac{w^{*}_{n}}{\sum_{n=1}^{N}w^{*}_{n}} \end{equation*}\] and \(w^{*}_{n}\) is the logarithmic mean of the shares, \[\begin{equation*} w^{*}_{n} = \frac{s^{t}_{n}-s^{t-1}_{n}}{\log (s^{t}_{n}) - \log (s^{t-1}_{n})} \end{equation*}\]

- CES index, also known as the Lloyd-Moulton index (Lloyd 1975; Moulton 1996), \[\begin{equation*} P(p^{t-1},p^{t},q^{t-1}) = \left[\sum_{n=1}^{N}s_{n}^{t-1}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{(1-\sigma)}\right]^{\left(\frac{1}{1-\sigma}\right)}, \end{equation*}\] where \(\sigma\) is the elasticity of substitution.

To estimate a simple chained Laspeyres price index,

```
priceIndex(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
indexMethod = "laspeyres",
output = "chained")
```

```
## [,1]
## [1,] 1.0000000
## [2,] 0.9673077
## [3,] 1.2905504
## [4,] 1.3382002
## [5,] 1.2482444
## [6,] 1.7346552
## [7,] 1.6530619
## [8,] 1.4524186
## [9,] 1.8386215
## [10,] 1.7126802
## [11,] 2.1810170
## [12,] 2.2000474
```

Estimating multiple different index numbers on the same data is straight-forward,

```
methods <- c("laspeyres","paasche","fisher","tornqvist")
prices <- lapply(methods,
function(x) {priceIndex(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
indexMethod = x,
output = "chained")})
as.data.frame(prices, col.names = methods)
```

```
## laspeyres paasche fisher tornqvist
## 1 1.0000000 1.0000000 1.0000000 1.0000000
## 2 0.9673077 0.8007632 0.8801048 0.8925715
## 3 1.2905504 0.8987146 1.0769571 1.0789612
## 4 1.3382002 0.9247902 1.1124543 1.1146080
## 5 1.2482444 0.6715974 0.9155969 0.9327861
## 6 1.7346552 0.7858912 1.1675831 1.1790710
## 7 1.6530619 0.7472454 1.1114148 1.1223220
## 8 1.4524186 0.5836022 0.9206708 0.9379711
## 9 1.8386215 0.6431381 1.0874224 1.0961295
## 10 1.7126802 0.5145138 0.9387213 0.9527309
## 11 2.1810170 0.5736947 1.1185875 1.1288419
## 12 2.2000474 0.5745408 1.1242851 1.1346166
```

This illustrates the Laspeyres index’s substantial positive bias, the Paasche index’s substantial negative bias, and the similar estimates produced by the Fisher and Tornqvist superlative index numbers.

The CES index number method requires an elasticity of substitution parameter in order to be calculated. *IndexNumR* provides a function `elasticity`

to estimate the elasticity of substitution parameter, following the method of (Balk 2000). The basic method is to solve for the value of the elasticity of substitution that equates the CES index to a comparison index. One comparison index noted by Balk is the ‘current period’ CES index, \[\begin{equation}
\left[\sum_{n=1}^{N}s_{n}^{t}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{-(1-\sigma)}\right]^{\left(\frac{-1}{1-\sigma}\right)}.
\end{equation}\] Therefore, we numerically calculate the value of \(\sigma\) that solves, \[\begin{equation}
\left[\sum_{n=1}^{N}s_{n}^{t-1}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{(1-\sigma)}\right]^{\left(\frac{1}{1-\sigma}\right)} - \left[\sum_{n=1}^{N}s_{n}^{t}\left(\frac{p^{t}_{n}}{p^{t-1}_{n}}\right)^{-(1-\sigma)}\right]^{\left(\frac{-1}{1-\sigma}\right)} = 0.
\end{equation}\]

This is done using the `uniroot`

function of the *stats* package distributed with base R. Note that this equation can be used to solve for sigma for any \(t=2,\cdots,T\), so there are \(T-1\) potential estimates of sigma. The `elasticity`

function will return all \(T-1\) estimates as well as the arithmetic mean of the estimates. In addition to the current period CES index, Balk also notes that the Sato-Vartia index can be used, while (Ivancic, Diewert, and Fox 2010) note that a Fisher index could be used. Any of these three indexes can be used as the comparison index by specifying the `compIndex`

option as either `"fisher"`

, `"ces"`

or `"satovartia"`

. The current period CES index is the default.

The dataset available with *IndexNumR*, CES_sigma_2, was calculated assuming a CES cost function with an elasticity of substitution equal to 2. Running the `elasticity`

function on this dataset,

```
elasticity(CES_sigma_2,
pvar="prices",
qvar="quantities",
pervar="time",
prodID="prodID",
compIndex="ces")
```

```
## $sigma
## [1] 2
##
## $allsigma
## [,1]
## [1,] 2.000000
## [2,] 2.000001
## [3,] 2.000000
## [4,] 1.999999
## [5,] 2.000000
## [6,] 2.000000
## [7,] 2.000000
## [8,] 2.000000
## [9,] 2.000000
## [10,] 2.000000
## [11,] 2.000000
##
## $diff
## [,1]
## [1,] -5.418676e-09
## [2,] -5.665104e-08
## [3,] 3.426148e-13
## [4,] 1.213978e-07
## [5,] 2.196501e-10
## [6,] -1.141232e-11
## [7,] 3.118616e-13
## [8,] 9.429124e-12
## [9,] -7.997090e-09
## [10,] 4.536105e-11
## [11,] 5.087042e-13
```

which recovers the value of \(\sigma\) used to construct the dataset. There is one additional item of output labelled ‘diff’. This is the value of the difference between the CES index and the comparison index and is returned so that the user can check that the value of this difference is indeed zero. If it is non-zero then it may indicate that `uniroot`

was not able to find a solution, within the specified upper and lower bounds for \(\sigma\). These bounds can be changed with the options `upper`

and `lower`

of the `elasticity`

function. The defaults are 20 and -20 respectively.

One problem with chain-linked indices is that depending on the index number method chosen, the index will likely suffer from chain drift. Take an example where prices increase in one period and then return to their original level in the next period. An index suffering from chain-drift will increase when prices increase, but won’t return to its original level when prices do. In the above examples, it was noted that there is substantial positive bias in the Laspeyres index and substantial negative bias in the Paasche index. Part of this is due to chain drift.

One way of reducing the amount of chain drift is to choose linking periods that are ‘similar’ in some sense (a way of eliminating chain-drift is to use the GEKS multilateral index discussed later). This method of linking has been mentioned by Diewert and Fox (Diewert and Fox 2017), and Hill (Hill 2001) takes the concept further to choose the link period based on a minimum cost spanning tree.

To choose the linking period we need a measure of the similarity between two periods. For each period we have information on prices and quantities. The Hill (2001) method compares the two periods based on the Paasche-Laspeyres spread,

\[\begin{equation} PL (p^{t},p^{T+1},q^{t},q^{T+1}) = \Bigg|{ln\Bigg(\frac{P_{T+1,t}^{L}}{P_{T+1,t}^{P}}\Bigg)}\Bigg|, \end{equation}\]

where \(P^{L}\) is a Laspeyres price index and \(P^{P}\) is a Paasche price index. Since the Laspeyres and Paasche indices are biased in opposite directions, this choice of similarity measure is designed to choose linking periods that minimise the influence of index number method choice.

Alternative measures exist for comparing the dissimilarity of two vectors. Two such measures, recommended by Diewert (Diewert 2002) are the weighted log-quadratic index of relative price dissimilarity and the weighted asymptotically linear index of relative price dissimilarity, given by the following, \[\begin{align} LQ(p^{t},p^{T+1},q^{t},q^{T+1}) = \sum_{n=1}^{N}\frac{1}{2}&(s_{T+1,n} + s_{t,n})[ln(p_{T+1,n}/P(p^{t},p^{T+1},q^{t},q^{T+1})p_{t,n})]^{2} \label{eq:logQuadratic} \\ AL(p^{t},p^{T+1},q^{t},q^{T+1}) = \sum_{n=1}^{N}\frac{1}{2}&(s_{T+1,n} + s_{t,n})[(p_{T+1,n}/P(p^{t},p^{T+1},q^{t},q^{T+1})p_{t,n}) + \nonumber \\ & (P(p^{t},p^{T+1},q^{t},q^{T+1})p_{t,n}/p_{T+1,n}) - 2] \end{align}\] where \(P(p^{t},p^{T+1},q^{t},q^{T+1})\) is a superlative index number.

A final measure proposed by Fox, Hill and Diewert (Fox, Hill, and Diewert 2004) is a measure of absolute dissimilarity given by,

\[\begin{equation}
AD(x_{j},x_{k}) = \frac{1}{M+N}\sum_{l=1}^{M+N}\Bigg[ln\Bigg(\frac{x_{kl}}{x_{jl}}\Bigg) - \frac{1}{M+N}\sum_{i=1}^{M+N}ln\Bigg(\frac{x_{ki}}{x_{ji}}\Bigg)\Bigg]^{2} + \Bigg[\frac{1}{M+N}\sum_{i=1}^{M+N}ln\Bigg(\frac{x_{ki}}{x_{ji}}\Bigg)\Bigg]^{2},
\end{equation}\]

where \(M+N\) is the total number of items in the vector and \(x_{j}\) and \(x_{k}\) are the two vectors being compared. The authors use this in the context of detecting outliers, but it can be used to compare the price and quantity vectors of two time periods. One way to do this is to only use price information, or only use quantity information. There are two ways to use both price and quantity information: stack the price and quantity vectors for each time period into a single vector and compare the two `stacked’ vectors; or calculate separate measures of absolute dissimilarity for prices and quantities before combining these into a single measure. The former method is simple to implement, but augments the price vector with a quantity vector that may be of considerably different magnitude and variance. Another option is to compute the absolute dissimilarity using prices and quantities separately, then combine them by taking the geometric average.

*IndexNumR* provides two functions, enabling the estimation of all six dissimilarity measures above. The first function `relativeDissimilarity`

calculates the Paasche-Laspeyres spread, log-quadratic and asymptotically linear measures, and the second function `mixScaleDissimilarity`

computes the mix, scale and absolute measures of dissimilarity. Both functions provide the same output - a data frame with three columns containing the indices of the pairs of periods being compared in the first two columns and the value of the dissimilarity measure in the third column.

Once these have been computed, the function `maximiumSimilarityLinks`

can take the output data frame from these two functions and compute the maximum similarity linking periods as follows,

- Compute the measure of dissimilarity between all possible combinations of time periods.
- Set the price index to 1 in the first period.
- Compute the price index for the second period and chain it with the first period, \[\begin{equation*} P_{chain}^{2} = P_{chain}^{1} \times P(p^{1},p^{2},q^{1},q^{2}), \end{equation*}\] where \(P(p^{1},p^{2},q^{1},q^{2})\) is any bilateral index number formula.
- For each period \(t\) from \(3,\dots,T\), find the period \(t^{min}\) with the minimum dissimilarity, comparing period \(t\) to all periods \(1, \dots, t-1\).
- Compute the similarity chain-linked index number, \[\begin{equation*} P_{chain}^{t} = P_{chain}^{t^{min}} \times P(p^{t^{min}},p^{t},q^{t^{min}},q^{t}) \end{equation*}\]

Using the log-quadratic measure of relative dissimilarity, the dissimilarity between the periods in the `CES_sigma_2`

dataset is as follows,

```
lq <- relativeDissimilarity(CES_sigma_2,
pvar="prices",
qvar="quantities",
pervar = "time",
prodID = "prodID",
indexMethod = "fisher",
similarityMethod = "logquadratic")
head(lq)
```

```
## period_i period_j dissimilarity
## 1 1 2 0.09726451
## 2 1 3 0.02037395
## 3 1 4 0.04164311
## 4 1 5 0.28078294
## 5 1 6 0.08880177
## 6 1 7 0.08531212
```

The output from estimating the dissimilarity between periods can than be used to estimate the maximum similarity links,

```
## xt x0 dissimilarity
## 1 1 1 0.000000000
## 2 2 1 0.097264508
## 3 3 1 0.020373951
## 4 4 3 0.003832972
## 5 5 4 0.130990853
## 6 6 4 0.008684012
## 7 7 6 0.001122913
## 8 8 2 0.041022738
## 9 9 7 0.001367896
## 10 10 5 0.006962106
## 11 11 9 0.002946674
## 12 12 11 0.003612044
```

To estimate a chained Laspeyres index linking together the periods with maximum similarity as estimated above,

```
priceIndex(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
indexMethod = "laspeyres",
output = "chained",
chainMethod = "logquadratic")
```

```
## [,1]
## [1,] 1.0000000
## [2,] 0.9673077
## [3,] 1.1000000
## [4,] 1.1406143
## [5,] 1.0639405
## [6,] 1.2190887
## [7,] 1.1617463
## [8,] 1.0551558
## [9,] 1.1357327
## [10,] 1.0928877
## [11,] 1.1732711
## [12,] 1.1835084
```

Multilateral index number methods use data from multiple periods to compute each term in the index. *IndexNumR* provides the function `GEKSIndex`

to use the GEKS multilateral index number method.

The GEKS method is attributable to Gini (Gini 1931), Eltito and Koves (Eltetö and Köves 1964), and Szulc (Szulc 1964) in the cross-sectional context. The idea of adapting the method to the time series context is due to Balk (Balk 1981), and developed further by Ivancic, Diewert and Fox (Ivancic, Diewert, and Fox 2011).

The user must choose the size of the window over which to apply the GEKS method, typically one or two years of data plus one period to account for seasonality. Denote this as \(w\).The basic method followed by the function `GEKSIndex`

is as follows. Choose a period, denoted period \(k\), within the window as the base period. Calculate a bilateral index number between period \(k\) and every other period in the window. Repeat this for all possible choices of \(k\). This gives a matrix of size \(w\times w\) of bilateral indexes between all possible pairs of periods within the window. Then compute the GEKS indexes for the first \(w\) periods as, \[\begin{equation}
\left[ \prod_{k=1}^{w}P^{k,1} \right]^{1/w}, \left[ \prod_{k=1}^{w}P^{k,2} \right]^{1/w}, \cdots, \left[ \prod_{k=1}^{w}P^{k,w} \right]^{1/w},
\end{equation}\] where the term \(P^{k,t}\) is the bilateral index between period \(t\) and base period \(k\). *IndexNumR* offers the Fisher and Tornqvist index number methods for the index \(P\) via the `indexMethod`

option. The Tornqvist index method is the default. The \(w\times w\) matrix of bilateral indexes is as follows, \[P =
\begin{pmatrix}
P^{1,1} & \cdots & P^{1,w} \\
\vdots & \ddots & \vdots \\
P^{w,1} & \cdots & P^{w,w}
\end{pmatrix}
\] So that the first term of the GEKS index is the geometric mean of the elements in the first column of the above matrix, the second term is the geometric mean of the second column, and so on. Note that *IndexNumR* makes use of two facts about the matrix above to speed up computation: it is (inversely) symmetric so that \(P^{j,k} = 1/P^{k,j}\); and the diagonal elements are 1.

The indexes are then normalised by dividing by the first term, to give an index for the first \(w\) periods that starts at 1. If the index only covers \(w\) periods then no further calculation is required. However, if there are \(T>w\) periods in the dataset then the index must be extended.

Extending a GEKS index can be done in a multitude of ways. Statistical agencies generally do not revise price indices like the consumer price index, so the methods offered by *IndexNumR* to extend the GEKS index are methods that do not lead to revisions. More specifically, these are called *splicing methods* and the three options available are the *movement*, *window* and *mean splice*. The idea is that we start by moving the window forward by one period and calculate a GEKS index for the new window. There will be \(w-1\) overlapping periods between the initial GEKS index and the GEKS index computed on the window that has been rolled forward one period. Any one of these overlapping periods can be used to extend the GEKS index.

Let \(P_{OLD}\) be the GEKS index computed over periods \(1\) to \(w\) and let \(P_{NEW}\) be the GEKS index computed over the window rolled forward one period, from periods \(2\) to \(w+1\). Let the final GEKS index be \(P_{GEKS}\). For the first \(w\) periods \(P_{GEKS} = P_{OLD}\), then \(P_{GEKS}^{w+1}\) is computed using the splicing methods as follows.

Movement splice (Ivancic, Diewert, and Fox 2011) \[\begin{equation} P_{GEKS}^{w+1} = P_{GEKS}^{w} \times \frac{P_{NEW}^{w+1}}{P_{NEW}^{w}} \end{equation}\] That is, the movement between the final two periods of the GEKS index computed over the new window is used to extend the original index from period \(w\) to \(w+1\).

Window splice (Krsinich 2016) \[\begin{equation} P_{GEKS}^{w+1} = P_{GEKS}^{w} \times \frac{P_{NEW}^{w+1}/P_{NEW}^{2}}{P_{OLD}^{w}/P_{OLD}^{2}} \end{equation}\] In this case, the ratio of the movement between the first and last periods computed using the new window, to the movement between the first and last periods using the old window is used to extend the original index.

Mean splice (Ivancic, Diewert, and Fox 2011) \[\begin{equation} P_{GEKS}^{w+1} = P_{GEKS}^{w} \times \left( \prod_{t=1}^{w-1} \frac{P_{NEW}^{w+1}/P_{NEW}^{t+1}}{P_{OLD}^{w}/P_{OLD}^{t+1}} \right)^{\frac{1}{(w-1)}} \end{equation}\] The mean splice uses the geometric mean of the movements between the last period and every other period in the window to extend the original index.

The splicing methods are used in this fashion to extend the series up to the final period in the data.

```
GEKSIndex(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
indexMethod = "tornqvist",
window=11,
splice = "mean")
```

```
## [,1]
## [1,] 1.0000000
## [2,] 0.8927314
## [3,] 1.0776386
## [4,] 1.1127724
## [5,] 0.9310834
## [6,] 1.1785361
## [7,] 1.1219447
## [8,] 0.9380228
## [9,] 1.0951667
## [10,] 0.9501914
## [11,] 1.1277725
## [12,] 1.1330748
```

The above index number methods are derived based on a ratio approach, which decomposes the value change from one period to the next into the product of a price index and a quantity index. An alternative approach is to decompose value change into the sum of a price indicator and a quantity indicator. The theory dates back to the 1920s, and an excellent paper on this approach has been written by Diewert (Diewert 2005). There are a number of methods available for computing the indicator, and *IndexNumR* exposes the following, via the `priceIndicator`

function:

Laspeyres indicator \[\begin{equation} I(p^{t-1}, p^{t}) = \sum_{n=1}^{N}q_{n}^{t-1}\times(p_{n}^{t}-p_{n}^{t-1}) \end{equation}\]

Paasche indicator \[\begin{equation} I(p^{t-1}, p^{t}) = \sum_{n=1}^{N}q_{n}^{t}\times(p_{n}^{t}-p_{n}^{t-1}) \end{equation}\]

Bennet indicator (Bennet 1920) \[\begin{equation} I(p^{t-1}, p^{t}) = \sum_{n=1}^{N} \frac{(q_{n}^{t}+q_{n}^{t-1})}{2} \times(p_{n}^{t}-p_{n}^{t-1}) \end{equation}\]

Montgomery indicator (Montgomery 1929) \[\begin{equation} I(p^{t-1}, p^{t}) = \sum_{n=1}^{N} \frac{p_{n}^{t}q_{n}^{t}+p_{n}^{t-1}q_{n}^{t-1}}{log(p_{n}^{t}q_{n}^{t}) - log(p_{n}^{t-1}q_{n}^{t-1})} \times\left(\frac{p_{n}^{t}}{p_{n}^{t-1}}\right) \end{equation}\]

Price indicators for the `CES_sigma_2`

dataset are as follows:

```
methods <- c("laspeyres", "paasche", "bennet", "montgomery")
p <- lapply(methods, function(x) {priceIndicator(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
method = x)})
as.data.frame(p, col.names = methods)
```

```
## laspeyres paasche bennet montgomery
## 1 NA NA NA NA
## 2 -0.3269231 -3.23451167 -1.78071737 -1.27874802
## 3 4.3441768 1.19889566 2.77153621 2.23764163
## 4 0.4061429 0.33835480 0.37224887 0.37329461
## 5 -0.8066580 -5.65501233 -3.23083515 -2.35138599
## 6 5.8451382 1.89061744 3.86787782 3.23912451
## 7 -0.6114830 -0.72404798 -0.66776546 -0.66571059
## 8 -1.6992746 -4.76683536 -3.23305498 -2.74535253
## 9 4.5203554 1.38856453 2.95445995 2.45168559
## 10 -1.0274652 -4.49985294 -2.76365909 -2.28761791
## 11 4.9221471 1.65051935 3.28633320 2.85483403
## 12 0.1396069 0.02503502 0.08232098 0.08391295
```

Quantity indicators can also be produced using the same methods as outlined above via the `quantityIndicator`

function. This allows for the value change from one period to the next to be decomposed into price and quantity movements. To facilitate this, *IndexNumR* contains the `valueDecomposition`

function, which can be used as follows to produce a decomposition of the value change for CES_sigma_2 using a Bennet indicator:

```
valueDecomposition(CES_sigma_2,
pvar = "prices",
qvar = "quantities",
pervar = "time",
prodID = "prodID",
priceMethod = "bennet")
```

```
## price quantity changes values
## 1 NA NA NA NA
## 2 -1.78071737 4.7807174 3 13
## 3 2.77153621 -4.7715362 -2 11
## 4 0.37224887 0.6277511 1 12
## 5 -3.23083515 6.2308351 3 15
## 6 3.86787782 -5.8678778 -2 13
## 7 -0.66776546 1.6677655 1 14
## 8 -3.23305498 6.2330550 3 17
## 9 2.95445995 -4.9544600 -2 15
## 10 -2.76365909 5.7636591 3 18
## 11 3.28633320 -5.2863332 -2 16
## 12 0.08232098 0.9176790 1 17
```

Note that for this decomposition, the method is specified for the price indicator and *IndexNumR* uses the appropriate quantity indicator. For Bennet and Montgomery indicators the same method is used for the quantity indicator as for the price indicator. If a Laspeyres price indicator is requested then the corresponding volume indicator is a Paasche indicator. The reverse is true if the Paasche indicator is used for prices.

*IndexNumR* is hosted on Github at https://github.com/grahamjwhite/IndexNumR. There users can find instructions to install the development version directly from Github, as well as report and view bugs or improvements.

Balk, B M. 1981. “A Simple Method for Constructing Price Indices for Seasonal Commodities.” *Statistische Hefte* 22 (1).

———. 2000. “On Curing the CPI’s Substitution and New Goods Bias.” Research Paper 0005. Statistics Netherlands.

Bennet, T. L. 1920. “The Theory of Measurement of Changes in Cost of Living.” *Journal of the Royal Statistics Society* 83: 455–62.

Carli, G-R. 1804. “Del Valore E Della Proporzione Dei Metalli Monetati.” *Scrittori Classici Italiani Di Economia Politica* 13: 297–366.

Diewert, Walter E. 2002. “Similarity and Dissimilarity Indexes: An Axiomatic Approach.” Discussion Paper 02–10. Department of Economics, University of British Columbia.

———. 2005. “Index Number Theory Using Differences Rather Than Ratios.” *American Journal of Economics and Sociology* 64 (1): 331–60.

Diewert, W. Erwin, and Kevin Fox. 2017. “Substitution Bias in Multilateral Methods for Cpi Construction Using Scanner Data.” Discussion Paper 17–02. Department of Economics, University of British Columbia.

Dutot, N. 1738. *Réflections Politiques Sur Les Finances et Le Commerce*. La Haye: Les frères Vaillant et N. Prevost.

Eltetö, Ö, and P Köves. 1964. “On a Problem of Index Number Computation Relating to International Comparisons.” *Statisztikai Szemle* 42: 507–18.

Fisher, I. 1921. “The Best Form of Index Number.” *Journal of the American Statistical Association* 17: 533–37.

Fox, Kevin, Robert Hill, and W. Erwin Diewert. 2004. “Identifying Outliers in Multi-Output Models.” *Journal of Productivity Analysis* 22 (1/2): 73–94.

Gini, C. 1931. “On the Circular Test of Index Numbers.” *International Review of Statistics* 9 (2): 3–25.

Hill, Robert. 2001. “Measuring Inflation and Growth Using Spanning Trees.” *International Economic REview* 42 (1): 167–85. http://dx.doi.org/10.1111/1468-2354.00105.

Ivancic, Loraine, Walter E. Diewert, and Kevin J. Fox. 2010. “Using a Constant Elasticity of Substitution Index to Estimate a Cost of Living Index: From Theory to Practice.” School of Economics Discussion Paper 2010/15. University of New South Wales.

———. 2011. “Scanner Data, Time Aggregation and the Construction of Price Indexes.” *Journal of Econometrics* 161: 24–35. https://doi.org/10.1016/j.jeconom.2010.09.003.

Jevons, W S. 1865. “The Variation in Prices and the Value of the Currency Since 1782.” *Journal of the Statistical Society of London* 28: 294–320. https://doi.org/10.2307/2338419.

Krsinich, F. 2016. “The FEWS Index: Fixed Effects with a Window Splice.” *Journal of Official Statistics* 32: 375–404.

Laspeyres, E. 1871. “Die Berechnung Einer Mittleren Waarenpreissteigerung.” *Jahrbücher Für Nationalökonomie Und Statistik* 16: 296–314.

Lloyd, P J. 1975. “Substitution Effects and Biases in Nontrue Price Indices.” *The American Economic Review* 65 (3): 301–13.

Montgomery, J. K. 1929. “Is There a Theoretically Correct Price Index of a Group of Commodities?” Poliglotta (privately printed paper, 16 pages). Rome: Roma L’Universale Tipogr.

Moulton, B R. 1996. “Constant Elasticity Cost-of-Living Index in Share-Relative Form.” Mimeo. Bureau of Labour Statistics.

Paasche, H. 1874. “Über Die Preisentwicklung Der Letzten Jahre Nach Den Hamburger Borsennotirungen.” *Jahrb"̆cher Für Nationalökonomie Und Statistik* 12: 168–78.

Sato, K. 1976. “The Ideal Log-Change Index Number.” *The Review of Economics and Statistics* 53: 223–28.

Szulc, B J. 1964. “Indices for Multiregional Comparisons.” *Przeglad Statystyczny* 3: 239–54.

Törnqvist, L. 1936. “The Bank of Finland’s Consumption Price Index.” *Bank of Finland Monthly Bulletin* 10: 1–8.

Törnqvist, L, and E Törnqvist. 1937. “Vilket är Fö Rhå Llandet Mellan Finska Markens Och Svenska Kronans Köpkraft?” *Ekonomiska Samfundets Tidskrift* 39: 121–60.

Vartia, Y O. 1976. “Ideal Log-Change Index Numbers.” *The Scandinavian Journal of Statistics* 3: 121–26.