Introduction to PHEindicatormethods

Georgina Anderson

2020-03-12

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop      value    lowercl    uppercl confidence
#> 1 Area1 2015  12 176 0.06818182 0.03942979 0.11538134        95%
#> 2 Area2 2015  11 200 0.05500000 0.03098534 0.09578700        95%
#> 3 Area3 2015  94 110 0.85454545 0.77672261 0.90844078        95%
#> 4 Area4 2015  67 165 0.40606061 0.33409150 0.48230431        95%
#> 5 Area1 2016   5 185 0.02702703 0.01159838 0.06169833        95%
#> 6 Area2 2016  52 161 0.32298137 0.25560301 0.39861020        95%
#> 7 Area3 2016  69 164 0.42073171 0.34783762 0.49725429        95%
#> 8 Area4 2016  95 129 0.73643411 0.65436652 0.80482747        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop      value     lowercl    uppercl confidence
#> 1 Area1 2015  12 176 0.06818182 0.029056743 0.15175500      99.8%
#> 2 Area2 2015  11 200 0.05500000 0.022555178 0.12800366      99.8%
#> 3 Area3 2015  94 110 0.85454545 0.722634682 0.92981453      99.8%
#> 4 Area4 2015  67 165 0.40606061 0.296217285 0.52618270      99.8%
#> 5 Area1 2016   5 185 0.02702703 0.007467365 0.09301879      99.8%
#> 6 Area2 2016  52 161 0.32298137 0.221799192 0.44398706      99.8%
#> 7 Area3 2016  69 164 0.42073171 0.309207763 0.54097910      99.8%
#> 8 Area4 2016  95 129 0.73643411 0.603330992 0.83694475      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop     value   lowercl   uppercl confidence  statistic method
#> 1 Area1 2015  12 176  6.818182  3.942979 11.538134        95% percentage Wilson
#> 2 Area2 2015  11 200  5.500000  3.098534  9.578700        95% percentage Wilson
#> 3 Area3 2015  94 110 85.454545 77.672261 90.844078        95% percentage Wilson
#> 4 Area4 2015  67 165 40.606061 33.409150 48.230431        95% percentage Wilson
#> 5 Area1 2016   5 185  2.702703  1.159838  6.169833        95% percentage Wilson
#> 6 Area2 2016  52 161 32.298137 25.560301 39.861020        95% percentage Wilson
#> 7 Area3 2016  69 164 42.073171 34.783762 49.725429        95% percentage Wilson
#> 8 Area4 2016  95 129 73.643411 65.436652 80.482747        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl   uppercl confidence  statistic
#> 1 Area1 2015  12 176  6.818182  2.9056743 15.175500      99.8% percentage
#> 2 Area2 2015  11 200  5.500000  2.2555178 12.800366      99.8% percentage
#> 3 Area3 2015  94 110 85.454545 72.2634682 92.981453      99.8% percentage
#> 4 Area4 2015  67 165 40.606061 29.6217285 52.618270      99.8% percentage
#> 5 Area1 2016   5 185  2.702703  0.7467365  9.301879      99.8% percentage
#> 6 Area2 2016  52 161 32.298137 22.1799192 44.398706      99.8% percentage
#> 7 Area3 2016  69 164 42.073171 30.9207763 54.097910      99.8% percentage
#> 8 Area4 2016  95 129 73.643411 60.3330992 83.694475      99.8% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop     value    lowercl   uppercl
#> 1 Area1 2015  12 176  6.818182  2.9056743 15.175500
#> 2 Area2 2015  11 200  5.500000  2.2555178 12.800366
#> 3 Area3 2015  94 110 85.454545 72.2634682 92.981453
#> 4 Area4 2015  67 165 40.606061 29.6217285 52.618270
#> 5 Area1 2016   5 185  2.702703  0.7467365  9.301879
#> 6 Area2 2016  52 161 32.298137 22.1799192 44.398706
#> 7 Area3 2016  69 164 42.073171 30.9207763 54.097910
#> 8 Area4 2016  95 129 73.643411 60.3330992 83.694475

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop     value    lowercl    uppercl confidence       statistic
#> 1 Area1 2015  12 176  6818.182  3519.0397  11910.715        95% rate per 100000
#> 2 Area2 2015  11 200  5500.000  2741.8451   9841.619        95% rate per 100000
#> 3 Area3 2015  94 110 85454.545 69054.1930 104575.783        95% rate per 100000
#> 4 Area4 2015  67 165 40606.061 31467.6873  51569.045        95% rate per 100000
#> 5 Area1 2016   5 185  2702.703   877.5602   6307.207        95% rate per 100000
#> 6 Area2 2016  52 161 32298.137 24120.0571  42355.565        95% rate per 100000
#> 7 Area3 2016  69 164 42073.171 32734.0031  53247.072        95% rate per 100000
#> 8 Area4 2016  95 129 73643.411 59580.4102  90026.153        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Exact
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl    uppercl confidence    statistic
#> 1 Area1 2015  12 176  6.818182  2.2729409  15.386909      99.8% rate per 100
#> 2 Area2 2015  11 200  5.500000  1.7241851  12.823260      99.8% rate per 100
#> 3 Area3 2015  94 110 85.454545 60.7669262 116.370993      99.8% rate per 100
#> 4 Area4 2015  67 165 40.606061 26.9701671  58.410443      99.8% rate per 100
#> 5 Area1 2016   5 185  2.702703  0.3996604   8.894457      99.8% rate per 100
#> 6 Area2 2016  52 161 32.298137 20.1884062  48.693791      99.8% rate per 100
#> 7 Area3 2016  69 164 42.073171 28.1261491  60.212611      99.8% rate per 100
#> 8 Area4 2016  95 129 73.643411 52.4689388 100.128698      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Exact
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl    uppercl
#> 1 Area1 2015  12 176  6.818182  2.2729409  15.386909
#> 2 Area2 2015  11 200  5.500000  1.7241851  12.823260
#> 3 Area3 2015  94 110 85.454545 60.7669262 116.370993
#> 4 Area4 2015  67 165 40.606061 26.9701671  58.410443
#> 5 Area1 2016   5 185  2.702703  0.3996604   8.894457
#> 6 Area2 2016  52 161 32.298137 20.1884062  48.693791
#> 7 Area3 2016  69 164 42.073171 28.1261491  60.212611
#> 8 Area4 2016  95 129 73.643411 52.4689388 100.128698

These functions can also return aggregate data if the input dataframes are grouped:



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1823    290150  637.    607.    668. 95%       
#>  2 Area1  2006 Male         1960    272886  761.    724.    798. 95%       
#>  3 Area1  2007 Fema~        1907    278471  708.    674.    742. 95%       
#>  4 Area1  2007 Male         1755    269277  649.    617.    681. 95%       
#>  5 Area1  2008 Fema~        1799    285167  627.    596.    659. 95%       
#>  6 Area1  2008 Male         1655    272028  744.    706.    783. 95%       
#>  7 Area1  2009 Fema~        2271    283824  838.    802.    876. 95%       
#>  8 Area1  2009 Male         2291    277646  827.    791.    864. 95%       
#>  9 Area1  2010 Fema~        1743    285187  606.    575.    637. 95%       
#> 10 Area1  2010 Male         1660    251262  624.    591.    658. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year [20]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <fct> <int> <fct>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1823    290150  637.    607.    668.
#>  2 Area1  2006 Male          1960    272886  761.    724.    798.
#>  3 Area1  2007 Female        1907    278471  708.    674.    742.
#>  4 Area1  2007 Male          1755    269277  649.    617.    681.
#>  5 Area1  2008 Female        1799    285167  627.    596.    659.
#>  6 Area1  2008 Male          1655    272028  744.    706.    783.
#>  7 Area1  2009 Female        2271    283824  838.    802.    876.
#>  8 Area1  2009 Male          2291    277646  827.    791.    864.
#>  9 Area1  2010 Female        1743    285187  606.    575.    637.
#> 10 Area1  2010 Male          1660    251262  624.    591.    658.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1823    290150  637.    607.    668. 95%       
#>  2 Area1  2006 Male         1960    272886  761.    724.    798. 95%       
#>  3 Area1  2007 Fema~        1907    278471  708.    674.    742. 95%       
#>  4 Area1  2007 Male         1755    269277  649.    617.    681. 95%       
#>  5 Area1  2008 Fema~        1799    285167  627.    596.    659. 95%       
#>  6 Area1  2008 Male         1655    272028  744.    706.    783. 95%       
#>  7 Area1  2009 Fema~        2271    283824  838.    802.    876. 95%       
#>  8 Area1  2009 Male         2291    277646  827.    791.    864. 95%       
#>  9 Area1  2010 Fema~        1743    285187  606.    575.    637. 95%       
#> 10 Area1  2010 Male         1660    251262  624.    591.    658. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1823    290150  637.    607.    668. 95%       
#>  2 Area1  2006 Male         1960    272886  761.    724.    798. 95%       
#>  3 Area1  2007 Fema~        1907    278471  708.    674.    742. 95%       
#>  4 Area1  2007 Male         1755    269277  649.    617.    681. 95%       
#>  5 Area1  2008 Fema~        1799    285167  627.    596.    659. 95%       
#>  6 Area1  2008 Male         1655    272028  744.    706.    783. 95%       
#>  7 Area1  2009 Fema~        2271    283824  838.    802.    876. 95%       
#>  8 Area1  2009 Male         2291    277646  827.    791.    864. 95%       
#>  9 Area1  2010 Fema~        1743    285187  606.    575.    637. 95%       
#> 10 Area1  2010 Male         1660    251262  624.    591.    658. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1604    231777  666.    633.    699. 95%       
#>  2 Area1  2006 Male         1510    214775  760.    720.    800. 95%       
#>  3 Area1  2007 Fema~        1546    214213  730.    693.    768. 95%       
#>  4 Area1  2007 Male         1355    218590  626.    593.    661. 95%       
#>  5 Area1  2008 Fema~        1452    231978  627.    595.    661. 95%       
#>  6 Area1  2008 Male         1462    202204  791.    749.    834. 95%       
#>  7 Area1  2009 Fema~        1916    223715  877.    837.    918. 95%       
#>  8 Area1  2009 Male         1667    221576  792.    754.    831. 95%       
#>  9 Area1  2010 Fema~        1233    232456  547.    515.    579. 95%       
#> 10 Area1  2010 Male         1086    204182  579.    544.    615. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(area, year) %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area [4]
#>    area   year total_count total_pop value lowercl uppercl confidence statistic
#>    <fct> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#>  1 Area1  2006        3783    563036  697.    674.    721. 95%        dsr per ~
#>  2 Area1  2007        3662    547748  676.    653.    700. 95%        dsr per ~
#>  3 Area1  2008        3454    557195  651.    628.    675. 95%        dsr per ~
#>  4 Area1  2009        4562    561470  820.    795.    845. 95%        dsr per ~
#>  5 Area1  2010        3403    536449  595.    573.    617. 95%        dsr per ~
#>  6 Area2  2006        3906    550714  730.    706.    755. 95%        dsr per ~
#>  7 Area2  2007        3761    577945  685.    662.    709. 95%        dsr per ~
#>  8 Area2  2008        3864    582320  697.    674.    721. 95%        dsr per ~
#>  9 Area2  2009        4243    548600  772.    748.    797. 95%        dsr per ~
#> 10 Area2  2010        3442    590100  560.    540.    581. 95%        dsr per ~
#> 11 Area3  2006        4212    547767  834.    807.    861. 95%        dsr per ~
#> 12 Area3  2007        3686    562217  696.    672.    720. 95%        dsr per ~
#> 13 Area3  2008        4169    562967  758.    733.    782. 95%        dsr per ~
#> 14 Area3  2009        3809    562230  671.    648.    693. 95%        dsr per ~
#> 15 Area3  2010        3792    581921  662.    640.    685. 95%        dsr per ~
#> 16 Area4  2006        3658    575793  642.    619.    664. 95%        dsr per ~
#> 17 Area4  2007        3812    579170  700.    676.    724. 95%        dsr per ~
#> 18 Area4  2008        3412    553905  608.    586.    630. 95%        dsr per ~
#> 19 Area4  2009        3624    573619  641.    619.    663. 95%        dsr per ~
#> 20 Area4  2010        3885    572460  691.    668.    715. 95%        dsr per ~
#> # ... with 1 more variable: method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1823    2047. 0.890   0.850   0.932 95%       
#>  2 Area1  2006 Male      1960    1888. 1.04    0.993   1.09  95%       
#>  3 Area1  2007 Fema~     1907    1927. 0.990   0.946   1.03  95%       
#>  4 Area1  2007 Male      1755    1875. 0.936   0.893   0.981 95%       
#>  5 Area1  2008 Fema~     1799    1998. 0.900   0.859   0.943 95%       
#>  6 Area1  2008 Male      1655    1883. 0.879   0.837   0.922 95%       
#>  7 Area1  2009 Fema~     2271    1978. 1.15    1.10    1.20  95%       
#>  8 Area1  2009 Male      2291    1948. 1.18    1.13    1.23  95%       
#>  9 Area1  2010 Fema~     1743    1995. 0.874   0.833   0.916 95%       
#> 10 Area1  2010 Male      1660    1759. 0.944   0.899   0.990 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1823    2047. 0.890   0.850   0.932 95%       
#>  2 Area1  2006 Male      1960    1888. 1.04    0.993   1.09  95%       
#>  3 Area1  2007 Fema~     1907    1927. 0.990   0.946   1.03  95%       
#>  4 Area1  2007 Male      1755    1875. 0.936   0.893   0.981 95%       
#>  5 Area1  2008 Fema~     1799    1998. 0.900   0.859   0.943 95%       
#>  6 Area1  2008 Male      1655    1883. 0.879   0.837   0.922 95%       
#>  7 Area1  2009 Fema~     2271    1978. 1.15    1.10    1.20  95%       
#>  8 Area1  2009 Male      2291    1948. 1.18    1.13    1.23  95%       
#>  9 Area1  2010 Fema~     1743    1995. 0.874   0.833   0.916 95%       
#> 10 Area1  2010 Male      1660    1759. 0.944   0.899   0.990 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15559   15559  1       0.984   1.02 
#> 2  2007    14921   15774. 0.946   0.931   0.961
#> 3  2008    14899   15732. 0.947   0.932   0.962
#> 4  2009    16238   15724. 1.03    1.02    1.05 
#> 5  2010    14522   15928. 0.912   0.897   0.927

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1823    2047.     695.  619.    591.    648. 95%       
#>  2 Area1  2006 Male      1960    1888.     695.  722.    690.    755. 95%       
#>  3 Area1  2007 Fema~     1907    1927.     695.  688.    658.    720. 95%       
#>  4 Area1  2007 Male      1755    1875.     695.  651.    621.    682. 95%       
#>  5 Area1  2008 Fema~     1799    1998.     695.  626.    598.    656. 95%       
#>  6 Area1  2008 Male      1655    1883.     695.  611.    582.    641. 95%       
#>  7 Area1  2009 Fema~     2271    1978.     695.  798.    766.    832. 95%       
#>  8 Area1  2009 Male      2291    1948.     695.  818.    785.    852. 95%       
#>  9 Area1  2010 Fema~     1743    1995.     695.  608.    579.    637. 95%       
#> 10 Area1  2010 Male      1660    1759.     695.  656.    625.    689. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1823    2047.     695.  619.    591.    648. 95%       
#>  2 Area1  2006 Male      1960    1888.     695.  722.    690.    755. 95%       
#>  3 Area1  2007 Fema~     1907    1927.     695.  688.    658.    720. 95%       
#>  4 Area1  2007 Male      1755    1875.     695.  651.    621.    682. 95%       
#>  5 Area1  2008 Fema~     1799    1998.     695.  626.    598.    656. 95%       
#>  6 Area1  2008 Male      1655    1883.     695.  611.    582.    641. 95%       
#>  7 Area1  2009 Fema~     2271    1978.     695.  798.    766.    832. 95%       
#>  8 Area1  2009 Male      2291    1948.     695.  818.    785.    852. 95%       
#>  9 Area1  2010 Fema~     1743    1995.     695.  608.    579.    637. 95%       
#> 10 Area1  2010 Male      1660    1759.     695.  656.    625.    689. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15559   15559      695.  695.    685.    706.
#> 2  2007    14921   15774.     695.  658.    647.    668.
#> 3  2008    14899   15732.     695.  659.    648.    669.
#> 4  2009    16238   15724.     695.  718.    707.    729.
#> 5  2010    14522   15928.     695.  634.    624.    644.