Premium Pattern

Suppose we have some monthly exposures that we would like to add premium data to.

exposures_PM <- addExposures(records, type = "PM")
head(exposures_PM)
key duration policy_month start_int end_int exposure
B10251C8 1 1 2010-04-10 2010-05-09 0.08214
B10251C8 1 2 2010-05-10 2010-06-09 0.08487
B10251C8 1 3 2010-06-10 2010-07-09 0.08214
B10251C8 1 4 2010-07-10 2010-08-09 0.08487
B10251C8 1 5 2010-08-10 2010-09-09 0.08487
B10251C8 1 6 2010-09-10 2010-10-09 0.08214

Simulated premium data “trans” comes with the package.

head(trans)
key trans_date amt
B10251C8 2012-12-04 199
B10251C8 2013-12-28 197
B10251C8 2015-12-30 177
B10251C8 2019-05-07 192
B10251C8 2012-04-15 206
B10251C8 2019-04-02 220

The addStart function adds the start date of the appropriate exposure interval to the transactions.

trans_with_interval <- addStart(trans, exposures_PM)
head(trans_with_interval)
start_int key trans_date amt
2010-05-10 B10251C8 2010-05-28 190
2010-06-10 B10251C8 2010-07-04 189
2010-11-10 B10251C8 2010-11-21 179
2011-04-10 B10251C8 2011-05-08 210
2011-07-10 B10251C8 2011-07-12 198
2012-01-10 B10251C8 2012-01-14 194

We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.

trans_to_join <- trans_with_interval %>% group_by(start_int, key) %>% summarise(premium = sum(amt))
head(trans_to_join)
start_int key premium
2005-06-01 D68554D5 97
2005-10-01 D68554D5 169
2005-12-01 D68554D5 96
2006-01-01 D68554D5 193
2006-02-01 D68554D5 107
2006-03-01 D68554D5 119

Then we can join this to the exposures using a left join without duplicating any exposures.

premium_study <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(premium_study, 10)
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 NA
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 NA
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 NA
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 NA
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 NA
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 NA

Change the NA values resulting from the join to zeros using an if_else.

premium_study <- premium_study %>% mutate(premium = if_else(is.na(premium), 0, premium))
head(premium_study, 10)
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 0
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 0
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 0
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 0
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 0
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 0
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 0

Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.

premium_study %>% filter(policy_month %in% c(1,2)) %>% group_by(policy_month) %>% summarise(avg_premium = mean(premium))
policy_month avg_premium
1 60.46
2 66.88

Other Uses for addStart

Suppose we were interested in what the last premium paid by a policy was for some predictive analytics project. Again we left join the premium to the exposure frame.

previous_premium_unfilled <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(previous_premium_unfilled)
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 NA
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 NA
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 NA

This time we fill in NA values with the previous paid premium instead of 0. The first interval is NA because there are no prior premiums.

previous_premium <- previous_premium_unfilled %>%
tidyr::fill(premium, .direction = "down")
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 189
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 189
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 189

Data manipulations similar to this can be used to engineer features for anything varying with time: account values, guarantees, planned premiums, etc…