Prevalence is the total number of people with an ongoing health-related event, such as a medical condition or medication use, at a particular time or during a given period divided by the population at risk. In the previous vignettes we have seen how we can identify a denominator population and define and instantiate an outcome cohort. Prevalence then can be calculated to describe the proportion of people in the denominator population who are in the outcome cohort at a specified time point (point prevalence) or over a given time interval (period prevalence).
In the first plot below, we can We can see at time t+2 that 2 out of 5 people were in an outcome cohort, giving a point prevalence of 40%. In the second figure, period prevalence between t+2 and t+3 was also 40%. However for period prevalence between t and t+1, what do we do with those people who only contributed some time during the period? If we included them we´ll have a period prevalence of 20%, whereas if we require that everyone is observed for the full period to contribute then we´ll have a period prevalence of 33%.
General information on how to define outcome cohorts can be found in the vignette “Creating outcome cohorts”. The most important recommendations for defining an outcome cohort for calculating incidence are:
generateDenominatorCohortSet()
.Adequate use of the first two features above need to reflect the nature of the proposed outcome (e.g., whether it is an acute or chronic condition) and the research question being investigated.
estimatePointPrevalence()
and
estimatePeriodPrevalence()
are the functions we use to
estimate prevalence. To demonstrate its use, let´s load the
IncidencePrevalence package (along with a couple of packages to help for
subsequent plots) and generate 50,000 example patients using the
mockIncidencePrevalenceRef()
function, from whom we´ll
create a denominator population without adding any restrictions other
than a study period.
library(IncidencePrevalence)
library(dplyr)
library(tidyr)
<- mockIncidencePrevalenceRef(
cdm sampleSize = 50000,
outPre = 0.5
)
<- generateDenominatorCohortSet(
cdm cdm = cdm, name = "denominator",
cohortDateRange = c(as.Date("2008-01-01"), as.Date("2012-01-01")),
ageGroup = list(c(0, 150)),
sex = "Both",
daysPriorHistory = 0,
temporary = FALSE
)#> Creating denominator cohorts
#> Time taken to get cohorts: 0 min and 2 sec
$denominator %>%
cdmglimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB 0.8.1 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <chr> "2", "3", "4", "6", "7", "8", "12", "13", "19", "…
#> $ cohort_start_date <date> 2008-01-01, 2009-12-20, 2011-04-26, 2011-10-13, …
#> $ cohort_end_date <date> 2008-08-03, 2011-10-16, 2011-07-16, 2012-01-01, …
Let´s first calculate point prevalence on a yearly basis.
<- estimatePointPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Years",
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 5
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1", "1"
#> $ prevalence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ prevalence_end_date <date> 2008-01-01, 2009-01-01, 2010-…
#> $ n_cases <int> 26, 36, 39, 29, 31
#> $ n_population <int> 4442, 4646, 4655, 4664, 4667
#> $ prevalence <dbl> 0.005853219, 0.007748601, 0.0…
#> $ prevalence_95CI_lower <dbl> 0.003997601, 0.005602378, 0.00…
#> $ prevalence_95CI_upper <dbl> 0.008562779, 0.010708169, 0.01…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0, 0
#> $ analysis_type <chr> "point", "point", "point", "po…
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0, 0
#> $ denominator_cohort_id <int> 1, 1, 1, 1, 1
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"…
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev, ylim = c(0, NA))
We can also calculate point prevalence by calendar month.
<- estimatePointPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Months",
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 49
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ prevalence_start_date <date> 2008-01-01, 2008-02-01, 2008-…
#> $ prevalence_end_date <date> 2008-01-01, 2008-02-01, 2008-…
#> $ n_cases <int> 26, 22, 37, 32, 26, 31, 31, 30…
#> $ n_population <int> 4442, 4450, 4459, 4486, 4495, …
#> $ prevalence <dbl> 0.005853219, 0.004943820, 0.00…
#> $ prevalence_95CI_lower <dbl> 0.003997601, 0.003267167, 0.00…
#> $ prevalence_95CI_upper <dbl> 0.008562779, 0.007474450, 0.01…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ analysis_type <chr> "point", "point", "point", "po…
#> $ analysis_interval <chr> "months", "months", "months", …
#> $ analysis_complete_database_intervals <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_cohort_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"…
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev, ylim = c(0, NA))
By using the estimatePointPrevalence() function, we can further specify where to compute point prevalence in each time interval (start, middle, end). By default, this parameter is set to start. But we can use middle instead like so:
<- estimatePointPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Years",
timePoint = "middle",
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ prevalence_start_date <date> 2008-07-01, 2009-07-01, 2010-0…
#> $ prevalence_end_date <date> 2008-07-01, 2009-07-01, 2010-…
#> $ n_cases <int> 31, 35, 41, 26
#> $ n_population <int> 4555, 4573, 4752, 4716
#> $ prevalence <dbl> 0.006805708, 0.007653619, 0.0…
#> $ prevalence_95CI_lower <dbl> 0.004798807, 0.005508445, 0.00…
#> $ prevalence_95CI_upper <dbl> 0.009643779, 0.010625271, 0.01…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0
#> $ analysis_type <chr> "point", "point", "point", "po…
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_time_point <chr> "middle", "middle", "middle", …
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev, ylim = c(0, NA))
To calculate period prevalence by year (i.e. each period is a calendar year)
<- estimatePeriodPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Years",
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 4
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1"
#> $ prevalence_start_date <date> 2008-01-01, 2009-01-01, 2010-0…
#> $ prevalence_end_date <date> 2008-12-31, 2009-12-31, 2010-…
#> $ n_cases <int> 1696, 1739, 1715, 1734
#> $ n_population <int> 7926, 7953, 8060, 8016
#> $ prevalence <dbl> 0.2139793, 0.2186596, 0.21277…
#> $ prevalence_95CI_lower <dbl> 0.2050903, 0.2097124, 0.203982…
#> $ prevalence_95CI_upper <dbl> 0.2231454, 0.2278785, 0.221849…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1"
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0
#> $ analysis_type <chr> "period", "period", "period", …
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0
#> $ denominator_cohort_id <int> 1, 1, 1, 1
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev, ylim = c(0.1, 0.3))
To calculate period prevalence by month (i.e. each period is a calendar month)
<- estimatePeriodPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Months",
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 48
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ prevalence_start_date <date> 2008-01-01, 2008-02-01, 2008-…
#> $ prevalence_end_date <date> 2008-01-31, 2008-02-29, 2008-…
#> $ n_cases <int> 154, 147, 167, 181, 178, 152, …
#> $ n_population <int> 4733, 4706, 4761, 4759, 4797, …
#> $ prevalence <dbl> 0.03253750, 0.03123672, 0.0350…
#> $ prevalence_95CI_lower <dbl> 0.02784982, 0.02663631, 0.0302…
#> $ prevalence_95CI_upper <dbl> 0.03798338, 0.03660180, 0.0406…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ analysis_type <chr> "period", "period", "period", …
#> $ analysis_interval <chr> "months", "months", "months", …
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, …
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_cohort_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"…
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev, ylim = c(0, NA))
When using the estimatePeriodPrevalence() function, we can set the fullContribution parameter to decide whether individuals are required to be present in the database throughout the interval of interest in order to be included (fullContribution=TRUE). If not, individuals will only be required to be present for one day of the interval to contribute (fullContribution=FALSE), which would be specified like so:
<- estimatePeriodPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Months",
fullContribution = FALSE,
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 48
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ prevalence_start_date <date> 2008-01-01, 2008-02-01, 2008-…
#> $ prevalence_end_date <date> 2008-01-31, 2008-02-29, 2008-…
#> $ n_cases <int> 154, 147, 167, 181, 178, 152, …
#> $ n_population <int> 4733, 4706, 4761, 4759, 4797, …
#> $ prevalence <dbl> 0.03253750, 0.03123672, 0.0350…
#> $ prevalence_95CI_lower <dbl> 0.02784982, 0.02663631, 0.0302…
#> $ prevalence_95CI_upper <dbl> 0.03798338, 0.03660180, 0.0406…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ analysis_type <chr> "period", "period", "period", …
#> $ analysis_interval <chr> "months", "months", "months", …
#> $ analysis_complete_database_intervals <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, …
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_cohort_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"…
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev,
ylim = c(0, 0.07)
)
For both functions, we can also specify a look-back window to consider an outcome as prevalent if it was ongoing in some previous amount of days relative to the current time point/ period. If NULL, any prior outcome will be considered prevalent. If 0, only ongoing outcomes will be considered prevalent. This can be a useful option if, for example, outcome cohorts simply included people only for the day in which a relevant code was seen and prevalence is to be based on some prior time window (e.g. including outcomes as prevalent if they were seen in the last 30 days).
<- estimatePointPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Years",
outcomeLookbackDays = c(0, 30),
minCellCount = 0,
temporary = FALSE
)
%>%
prev glimpse()
#> Rows: 10
#> Columns: 30
#> $ analysis_id <chr> "1", "1", "1", "1", "1", "2", …
#> $ prevalence_start_date <date> 2008-01-01, 2009-01-01, 2010-…
#> $ prevalence_end_date <date> 2008-01-01, 2009-01-01, 2010-…
#> $ n_cases <int> 26, 36, 39, 29, 31, 136, 145, …
#> $ n_population <int> 4442, 4646, 4655, 4664, 4667, …
#> $ prevalence <dbl> 0.005853219, 0.007748601, 0.00…
#> $ prevalence_95CI_lower <dbl> 0.003997601, 0.005602378, 0.00…
#> $ prevalence_95CI_upper <dbl> 0.008562779, 0.010708169, 0.01…
#> $ cohort_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ result_obscured <chr> "FALSE", "FALSE", "FALSE", "FA…
#> $ outcome_cohort_id <chr> "1", "1", "1", "1", "1", "1", …
#> $ outcome_cohort_name <chr> "cohort_1", "cohort_1", "cohor…
#> $ analysis_outcome_lookback_days <dbl> 0, 0, 0, 0, 0, 30, 30, 30, 30,…
#> $ analysis_type <chr> "point", "point", "point", "po…
#> $ analysis_interval <chr> "years", "years", "years", "ye…
#> $ analysis_complete_database_intervals <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_time_point <chr> "start", "start", "start", "st…
#> $ analysis_full_contribution <lgl> FALSE, FALSE, FALSE, FALSE, FA…
#> $ analysis_min_cell_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#> $ denominator_cohort_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ denominator_cohort_name <chr> "Denominator cohort 1", "Denom…
#> $ denominator_age_group <chr> "0 to 150", "0 to 150", "0 to …
#> $ denominator_sex <chr> "Both", "Both", "Both", "Both"…
#> $ denominator_days_prior_history <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
#> $ denominator_start_date <date> 2008-01-01, 2008-01-01, 2008-0…
#> $ denominator_end_date <date> 2012-01-01, 2012-01-01, 2012-0…
#> $ denominator_strata_cohort_definition_id <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
#> $ denominator_strata_cohort_name <lgl> NA, NA, NA, NA, NA, NA, NA, N…
#> $ denominator_closed_cohort <lgl> FALSE, FALSE, FALSE, FALSE, F…
#> $ cdm_name <chr> "test_database", "test_databas…
plotPrevalence(prev,
colour = "analysis_outcome_lookback_days",
colour_name = "Outcome lookback days",
ylim = c(0, NA)
)
In the examples above, we have used calculated prevalence by months and years, but it can be also calculated by weeks, months or for the entire time period observed (overall). In addition, the user can decide whether to include time intervals that are not fully captured in the database (e.g., having data up to June for the last study year when computing period prevalence rates). By default, incidence will only be estimated for those intervals where the database captures all the interval (completeDatabaseIntervals=TRUE).
Given that we can set estimtePointPrevalence()
and
estimtePeriorPrevalence()
to exclude individuals based on
certain parameters (e.g., fullContribution), it is important to note
that the denominator population used to compute prevalence rates might
differ from the one calculated with
generateDenominatorCohortSet()
.
The user can also set the minimum number of events to be reported, below which results will be obscured. By default, results with <5 occurrences are blinded, but if minCellCount=0, all results will be reported. 95 % confidence intervals are calculated using the Wilson Score method. In addition, we can set verbose=TRUE to report progress as code is running. By default, no progress is reported (verbose=FALSE).
estimtePointPrevalence()
and
estimtePeriorPrevalence()
will generate a table with point
and period prevalence rates for each of the time intervals studied and
for each combination of the parameters set, respectively. Similar to the
output obtained by generateDenominatorCohortSet()
, the
table generated will also include attributes, including tibbles with
information on settings and attrition.
<- estimatePeriodPrevalence(
prev cdm = cdm,
denominatorTable = "denominator",
outcomeTable = "outcome",
interval = "Years",
fullContribution = c(TRUE, FALSE),
minCellCount = 0,
temporary = FALSE, # returnParticipants can only be TRUE with temporary = FALSE
returnParticipants = TRUE
)prevalenceAttrition(prev)
#> # A tibble: 22 × 27
#> analysis_id number_records number_subjects reason_id reason excluded_records
#> <chr> <dbl> <dbl> <dbl> <glue> <dbl>
#> 1 1 50000 50000 1 Starti… NA
#> 2 1 50000 50000 2 Missin… 0
#> 3 1 50000 50000 3 Missin… 0
#> 4 1 50000 50000 4 Cannot… 0
#> 5 1 18018 18018 5 No obs… 31982
#> 6 1 18018 18018 6 Doesn'… 0
#> 7 1 18018 18018 7 Prior … 0
#> 8 1 18018 18018 10 No obs… 0
#> 9 1 18018 18018 11 Starti… NA
#> 10 1 18014 18014 14 Not ob… 4
#> # ℹ 12 more rows
#> # ℹ 21 more variables: excluded_subjects <dbl>, outcome_cohort_id <chr>,
#> # outcome_cohort_name <chr>, analysis_outcome_lookback_days <dbl>,
#> # analysis_type <chr>, analysis_interval <chr>,
#> # analysis_complete_database_intervals <lgl>, analysis_time_point <chr>,
#> # analysis_full_contribution <lgl>, analysis_min_cell_count <dbl>,
#> # denominator_cohort_id <int>, denominator_cohort_name <chr>, …
In addition, if we set returnParticipants as TRUE as above, we can identify the individuals who contributed to the prevalence rate analysis by using `participants(). For example, we can identify those people contributing to analysis 1 by running
participants(prev, analysisId = 1) %>%
glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB 0.8.1 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ subject_id <chr> "3", "24", "30", "36", "145", "184", "263", "271", …
#> $ cohort_start_date <date> 2009-12-20, 2010-10-31, 2009-09-16, 2008-01-01, 20…
#> $ cohort_end_date <date> 2011-10-16, 2012-01-01, 2011-06-24, 2010-01-07, 20…
#> $ outcome_start_date <date> 2011-02-22, 2011-11-29, 2010-04-01, 2009-12-25, 20…
As we´ve used permanent tables for this example, we can drop these after running our analysis (note, the table created will start with write_prefix specified when creating the cdm reference).
::listTables(attr(cdm, "dbcon"), schema = attr(cdm, "write_schema"))
CDMConnector#> [1] "denominator_attrition" "denominator_set"
#> [3] "denominator_count" "denominator"
#> [5] "period_prev_participants1" "person"
#> [7] "observation_period" "strata"
#> [9] "outcome" "cdm_source"
#> [11] "vocabulary" "strata_set"
#> [13] "strata_count" "outcome_set"
#> [15] "outcome_count"
# drop tables created when instantiating denominator cohorts
::dropTable(
CDMConnectorcdm = cdm,
name = dplyr::starts_with(paste0(
attr(cdm, "write_prefix"),
"denominator"
))
)# drop table with study participants when returnParticipants = TRUE
::dropTable(
CDMConnectorcdm = cdm,
name = paste0(
attr(cdm, "write_prefix"),
"period_prev_participants1"
) )