Introduction to halk

The halk package is a suite of functions built for estimating age of organisms (namely fish) based on empirically measured size. One main implementation of this is a hierarchical age-length key, also known as a HALK.

What the heck is a HALK

The HALK is a data-borrowing age estimation method primarily used in fisheries ecology. It extends the traditional method of an age-length key (ALK) by borrowing data across time, space, or any other nested level to create nested ALKs used to estimate age of fish from empirically measured length. For example, if you have survey data for which length is measured, but no age sub-samples taken, you can still get some information on age by borrowing data from the same lake in different years, or different nearby lakes.

Implementing a HALK

A HALK is created by passing paired age-length data to the make_halk function. There are two main arguments to this function: data, which represents the paired age-length data, and levels, which is a character vector of the column names that represent the different nested levels in the HALK. For example, in the following data, you can pass any combination of spp, county, and waterbody as levels:

#> # A tibble: 6 × 5
#>   spp      county   waterbody   age length
#>   <chr>    <chr>    <chr>     <int>  <dbl>
#> 1 bluegill county_A lake_a        0    1  
#> 2 bluegill county_A lake_a        0    1  
#> 3 bluegill county_A lake_a        0    0.9
#> 4 bluegill county_A lake_a        0    1  
#> 5 bluegill county_A lake_a        0    1.1
#> 6 bluegill county_A lake_a        0    1.1

This will fit a HALK based on the user specified levels. Say that we include spp, county and waterbody as levels to the function make_halk. This will create an ALK for each waterbody, each county, and then a species-wide global ALK.

spp_county_wb_alk <- make_halk(
  wb_spp_laa_data, 
  levels = c("spp", "county", "waterbody")
)
head(spp_county_wb_alk)
#> # A tibble: 6 × 4
#>   spp      county   waterbody alk            
#>   <chr>    <chr>    <chr>     <list>         
#> 1 bluegill county_A lake_a    <alk [10 × 10]>
#> 2 bluegill county_A lake_b    <alk [12 × 10]>
#> 3 bluegill county_A lake_c    <alk [11 × 10]>
#> 4 bluegill county_A <NA>      <alk [13 × 10]>
#> 5 bluegill county_B lake_a    <alk [11 × 10]>
#> 6 bluegill county_B lake_b    <alk [12 × 10]>

The returned tibble contains a list-column named alk that stores an ALK for each level provided to the levels argument (note that the ALK for county_A has an NA in the waterbody column indicating that this is a county-wide ALK). Each object in this list-column is simply an ALK that is created using all data from the level indicated by the respective non-NA columns in that row.

# Bluegill ALK for lake_a in county_A, from row #1 above
head(spp_county_wb_alk$alk[[1]])
#> # A tibble: 6 × 10
#>   length  age0   age1   age2   age3  age4  age5  age6  age7  age8
#>    <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1      0     1 0      0      0          0     0     0     0     0
#> 2      1     1 0      0      0          0     0     0     0     0
#> 3      3     0 1      0      0          0     0     0     0     0
#> 4      4     0 0.918  0.0816 0          0     0     0     0     0
#> 5      5     0 0.0870 0.870  0.0435     0     0     0     0     0
#> 6      6     0 0      0.756  0.244      0     0     0     0     0

Estimating ages from a HALK

The halk package makes it easy to get age assigment from a HALK using the assign_ages function. Once you have created a HALK, simply pass it to assign_ages along with the length data you wish to have ages estimated on—make sure that your length data has all columns used in the levels argument used in make_halk.

est_ages <- assign_ages(wb_spp_length_data, spp_county_wb_alk)
head(est_ages)
#> # A tibble: 6 × 7
#>   spp      county   waterbody length est.age alk       alk.n
#>   <chr>    <chr>    <chr>      <dbl>   <dbl> <chr>     <int>
#> 1 bluegill county_A lake_a       1         0 waterbody   371
#> 2 bluegill county_A lake_a       1         0 waterbody   371
#> 3 bluegill county_A lake_a       0.9       0 waterbody   371
#> 4 bluegill county_A lake_a       1.1       0 waterbody   371
#> 5 bluegill county_A lake_a       1.1       0 waterbody   371
#> 6 bluegill county_A lake_a       1         0 waterbody   371

Notice that there are lakes in the est_ages object that were not present in the original length-at-age data used to create the spp_county_wb_alk object. Ages for these lengths are estimated at the county-wide level.

head(est_ages[est_ages$waterbody == "lake_x", ])
#> # A tibble: 6 × 7
#>   spp      county   waterbody length est.age alk    alk.n
#>   <chr>    <chr>    <chr>      <dbl>   <dbl> <chr>  <int>
#> 1 bluegill county_A lake_x       1         0 county  1088
#> 2 bluegill county_A lake_x       1.3       0 county  1088
#> 3 bluegill county_A lake_x       0.8       0 county  1088
#> 4 bluegill county_A lake_x       1         0 county  1088
#> 5 bluegill county_A lake_x       1.1       0 county  1088
#> 6 bluegill county_A lake_x       1.2       0 county  1088