bp: Blood Pressure Analysis in R

John Schwenck

Texas A&M University


In an effort to better understand the factors that influence hypertension and cardiovascular disease, visualization tools and metrics are often employed. However, these tools typically exist in silos through proprietary software. Until now, there has yet to be a comprehensive open-source R package that provides the necessary tools for analyzing such data in one place. The bp package provides an extensive framework for researchers to analyze both ABPM and non-ABPM blood pressure data in R through a variety of statistical methods and metrics from the literature as well as various data visualizations, with minimal code necessary to do so. This paper illustrates the main features of the bp package by analyzing both a single-subject and multi-subject dataset.


Despite the tremendous progress in the medical field, cardiovascular disease (CVD) remains the leading cause of death worldwide. Hypertension, specifically, affects over 1.1 billion people annually according to the American Heart Association [9]. This package serves to visualize and quantify various aspects of hypertension in a more digestible format using various metrics proposed in the literature.

Blood pressure data can be analyzed at varying degrees of granularity depending on the reading frequency, presence of a sleep indicator, and whether or not the temporal structure is accounted for. These factors almost always depend on the type of device used, where ABPM monitors are predominantly used for the short term (within 24 hours) and home monitoring devices or office readings are used for measuring variability over the medium and long term (day-to-day, visit-to-visit, etc) [11]. Unlike continuous heart rate monitors or continuous glucose monitors, there are currently no commercially-available continuous blood pressure monitors available for the middle to long term, posing a unique challenge for research.

Of primary concern to researchers is the ability to accurately quantify blood pressure variability (BPV). BPV has been shown to be an important factor in predicting cardiovascular events and sudden death, especially during susceptible periods such as the first two hours of waking up [10]. There have been many proposed methods for characterizing this variability; this package seeks to incorporate as many of these metrics as possible.

We introduce the first comprehensive open-source R package, bp, that both analyzes and visualizes blood pressure data. In an effort to help clinicians make sense of their patients’ data without requiring multiple software platforms for data processing, bp uses only a minimal amount of code to do so and offers additional capabilities beyond the traditional proprietary counterparts. At the time of writing, to the best of the authors’ knowledge, there are currently no other available software packages through the Comprehensive R Archive Network (CRAN) dedicated to blood pressure analysis.

In this paper, we demonstrate the main functionality of the bp package by exploring and analyzing both a single-subject pilot study of [ 8 ] and a multi-subject study, HYPNOS [ 11 ], to illustrate the differences between dataset structures and elaborate how to adjust the settings within the R package to accommodate either.

Blood Pressure Monitoring Overview

Blood pressure monitoring devices work by measuring the pressure of the artery’s restricted blood flow; for digital devices, the vibrations are translated into electrical signals. Unlike home monitors that only take readings upon the subject’s initiation, ambulatory blood pressure monitoring (ABPM) devices take automatic readings at pre-specified intervals over a 24-hour period or longer.

ABPM allows medical professionals to analyze blood pressure during sleep which has been shown to be a more accurate predictor of cardiovascular events than daytime blood pressure. ABPM also allows researchers to discern true hypertension from “whitecoat” hypertension in an office or laboratory setting. Because of the burden of assembling the device (and because of the lack of a commercial-grade alternative), ABPM measurements are intended for the short-term of 24-hours to a few days.

Home monitoring on the other hand, offers individuals the ability to record their blood pressure at will and can be tracked easily using mobile apps over the long-term of weeks, months, or years. However, because the user has to initiate the recording, readings cannot be taken during sleep.

As the nature of the two devices inhibits certain functionality depending on which device is used, we outline how to effectively analyze data for both types of devices.

According to the American Heart Association, there are currently 6 blood pressure stages that correspond to the readings from the monitoring devices: Low (Hypotension), Normal, Elevated, Stage 1 Hypertension, Stage 2 Hypertension, Hypertensive Crisis. Below is a table outlining the categories according to their definitions. Note that because of the ambiguity between Normal, Elevated, and Stage 1 diastolic blood pressure readings (because of the similar thresholds), this package splits the difference and sets a default threshold for Elevated DBP from 80 - 85 and Stage 1 Hypertension from 85 - 90. These thresholds can be adjusted by the user where applicable.

Blood Pressure Category Systolic (mmHg) Diastolic (mmHg)
Low (Hypotension) Less than 100 and Less than 60
Normal 100 - 120 and 60 - 80
Elevated 120 - 129 and 60 - 80
Stage 1 Hypertension 130 - 139 or 80 - 89
Stage 2 Hypertension 140 - 180 or 90 - 120
Hypertensive Crisis Higher than 180 and/or Higher than 120

The bp Package

The general workflow of the bp package consists of 1) a data processing stage and 2) an analysis stage, in ideally as little as two lines of code. The processing stage formats the user’s supplied input data in such a way that it adheres to the rest of the bp functions. The analysis stage uses the processed data to quantify various attributes of the blood pressure relationships or to provide various visualizations. One of the key abilities of the bp package is bp_report function which generates a report that combines such visualization plots into one easily digestible summary for clinicians or researchers to interpret an individual’s (or multiple individuals’) blood pressure results. We will walk through each of these stages in the subsequent sections.

Data Processing with the process_data function

Before any analysis can be done, the user-supplied data set must be first processed into the proper format using process_data to adhere to package data structure requirements and naming conventions. This function ensures that user-supplied data columns aren’t double counted or missed, since blood pressure data are often inconsistent and come from a wide variety of formats. While a tedious initial step, it will save time in the long-run as the resulting processed data will not require any future specification, which can then be directly plugged into the analysis functions. It is worth noting that if the user-supplied data set already adheres to the column naming conventions and data types, then the process_data function will be unnecessary. However, it is good practice to still make use of this function as a sanity check to verify all available variables.

The basic workflow is to load in the user-supplied unprocessed raw data, process it with the process_data function and save to a new dataframe. Note that the capitalization does not matter when specifying the columns.

## Load the sample hypnos_data
## In this scenario, the hypnos_data acts as the "user-supplied" data that is to be processed

## Assign the output of the process_data function to a new dataframe object
hypnos_proc <- process_data(hypnos_data,
                     sbp = 'syst',
                     dbp = 'diast',
                     bp_datetime = 'date.time',
                     id = 'id',
                     visit = 'visit',
                     hr = 'hr',
                     wake = 'wake',
                     pp = 'pp',
                     map = 'map',
                     rpp = 'rpp')

Notice how the column names of the original hypnos_data changed in the processed data. Notably, SYST became SBP, DIAST became DBP, and DATE.TIME became DATE_TIME.

#>  [1] "NR."       "DATE.TIME" "SYST"      "MAP"       "DIAST"     "HR"       
#>  [7] "PP"        "RPP"       "WAKE"      "ID"        "VISIT"     "DATE"
#>  [1] "ID"           "DATE"         "DATE_TIME"    "VISIT"        "WAKE"        
#>  [6] "SBP"          "DBP"          "MAP"          "PP"           "HR"          
#> [11] "RPP"          "NR."          "TIME_OF_DAY"  "DAY_OF_WEEK"  "SBP_CATEGORY"
#> [16] "DBP_CATEGORY"

While the results seem to be trivial at first glance, let’s see what happens when we use a much different data set with a completely different naming convention: the bp_jhs data set. Unlike hypnos_data which has all of the available columns needed in the process_data function with multiple subjects, bp_jhs is a single-subject data set without many of the multi-subject identifiers such as ID, VISIT, or WAKE (as it is non-ABPM data). Further, there is no MAP or PP column, but these (as we will see) can be automatically created.

## Load the sample bp_jhs data set
## As before, this is what will be referred to as the "user-supplied" data set

## Assign the output of the process_data function to a new dataframe object
jhs_proc <- process_data(bp_jhs,
                     sbp = 'sys.mmhg.',
                     dbp = 'dias.mmhg.',
                     bp_datetime = 'datetime',
                     hr = 'PULSE.BPM.')
#> No PP column found. Automatically generated from SBP and DBP columns.
#> No RPP column found. Automatically generated from SBP and HR columns.
#> No MAP column found. Automatically generated from SBP and DBP columns.
#> NOTE: Created DATE column from DATE_TIME column
head(jhs_proc, 5)
#>         DATE           DATE_TIME SBP DBP       MAP PP HR   RPP MONTH DAY YEAR
#> 1 2019-08-01 2019-08-01 09:15:54 132  80  97.33333 52 79 10428     8   1 2019
#> 2 2019-07-31 2019-07-31 11:39:59 126  77  93.33333 49 62  7812     7  31 2019
#> 3 2019-07-31 2019-07-31 11:38:07 128  76  93.33333 52 60  7680     7  31 2019
#> 4 2019-07-30 2019-07-30 13:47:46 130  81  97.33333 49 63  8190     7  30 2019
#> 5 2019-07-30 2019-07-30 13:46:15 134  83 100.00000 51 62  8308     7  30 2019
#> 1     Thu 09:15:54    9 Breakfast      52     Morning         Thu  1
#> 2     Wed 11:39:59   11 Breakfast      49     Morning         Wed  1
#> 3     Wed 11:38:07   11 Breakfast      52     Morning         Wed  1
#> 4     Tue 13:47:46   13     Lunch      49   Afternoon         Tue  1
#> 5     Tue 13:46:15   13     Lunch      51   Afternoon         Tue  1
#> 1      Stage 1       Normal
#> 2     Elevated       Normal
#> 3     Elevated       Normal
#> 4     Elevated     Elevated
#> 5      Stage 1     Elevated

After a quick inspection of the original bp_jhs data set and the newly processed data data set, it should be evident that there was a lot going on “under the hood” of the process_data function. As we can see from the column names, the awkward nuisance of typing sys.mmhg., dias.mmhg., pulse.bpm., and datetime have now been replaced with the more concise SBP, DBP, HR, and DATE_TIME names, respectively. Additionally, MAP, PP, RPP, SBP_Category, and DBP_Category were all calculated as additional columns which previously did not exist in the data. Additionally, if the supplied data has a column corresponding to a “date/time” format, the columns Time_of_Day and DAY_OF_WEEK will also be created for ease.

#>  [1] "DateTime"   "Month"      "Day"        "Year"       "DayofWk"   
#>  [6] "Time"       "Hour"       "Meal_Time"  "Sys.mmHg."  "Dias.mmHg."
#> [11] "bpDelta"    "Pulse.bpm."
#>  [1] "DATE"         "DATE_TIME"    "SBP"          "DBP"          "MAP"         
#>  [6] "PP"           "HR"           "RPP"          "MONTH"        "DAY"         
#> [11] "YEAR"         "DAYOFWK"      "TIME"         "HOUR"         "MEAL_TIME"   
#> [16] "BPDELTA"      "TIME_OF_DAY"  "DAY_OF_WEEK"  "ID"           "SBP_CATEGORY"
#> [21] "DBP_CATEGORY"

NOTE: For consistency, process_data will coerce all column names to upper-case.

Blood Pressure Metrics

After the data has been processed, we can now utilize the built-in metrics from the literature to characterize the blood pressure variability. To start, the following metrics are what is currently offered through the bp package:

Function Metric Name Source
arv Average Real Variability Mena et al (2005)
bp_mag Blood Pressure Magnitude Munter et al (2011)
bp_range Blood Pressure Range Levitan et al (2013)
cv Coefficient of Variation Munter et al (2011)
sv Successive Variation Munter et al (2011)
dip_calc Nocturnal Dipping % and Classification Okhubo et al (1997)

Time-Dependent Dispersion Metrics

  • arv - Average Real Variability
    • A measure of dispersion using the sum of absolute differences in successive observations
  • sv - Successive Variation
    • A measure of dispersion using the sum of squared differences in successive observations

Time-Independent Dispersion Metrics

  • bp_mag - Blood Pressure Magnitude (peak and trough)
    • Peak measures the distance from the average value to the maximum value
    • Trough measures the distance from the minimum value to the average value
  • bp_range - Blood Pressure Range
    • Range measures the distance from the minimum value to the maximum value
  • cv - Coefficient of Variation
    • Coefficient of Variation is a ratio of the standard deviation / average

Sleep-dependent Metrics

  • dip_calc - Nocturnal Dipping % and Classification
    • Nocturnal dipping percentage is the % drop in blood pressure while asleep compared with awake. Requires an indication of when a subject is asleep to know how to calculate: 1 - (avg sleep BP / avg daytime BP). The severity of the dipping percentage indicates the corresponding classification of that individual (dipper, non-dipper, reverse dipper).

Let’s say we are working with the hypnos_data and would like to compare the time-dependent nature of the arv with the sv for each subject.

#> # A tibble: 6 x 6
#> # Groups:   ID, VISIT [3]
#>   <int> <int> <int>   <dbl>   <dbl> <int>
#> 1 70417     1     0    7.67    5.5      7
#> 2 70417     1     1   10.2     6       23
#> 3 70417     2     0   17.7     7.57     8
#> 4 70417     2     1   11.2     8.12    17
#> 5 70422     1     0   10.5     4        5
#> 6 70422     1     1   14.9     6.62    17
#> # A tibble: 6 x 6
#> # Groups:   ID, VISIT [3]
#>   <int> <int> <int>  <dbl>  <dbl> <int>
#> 1 70417     1     0   8.49   5.76     7
#> 2 70417     1     1  12.1    7.95    23
#> 3 70417     2     0  18.9    8.22     8
#> 4 70417     2     1  13.0   10.2     17
#> 5 70422     1     0  11.2    5.74     5
#> 6 70422     1     1  19.1    8.80    17

Comparing vertically can be challenging, so with some help from dplyr we can obtain the following:

head(dplyr::left_join(arv(hypnos_proc), cv(hypnos_proc)))
#> Joining, by = c("ID", "VISIT", "WAKE", "N")
#> # A tibble: 6 x 10
#> # Groups:   ID, VISIT [3]
#>   <int> <int> <int>   <dbl>   <dbl> <int>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 70417     1     0    7.67    5.5      7   4.87   8.65   5.71   4.82
#> 2 70417     1     1   10.2     6       23   6.98   8.74   9.03   5.88
#> 3 70417     2     0   17.7     7.57     8   9.46  12.3   12.9    7.43
#> 4 70417     2     1   11.2     8.12    17   8.44  11.5   11.5    7.56
#> 5 70422     1     0   10.5     4        5   5.21   7.02   7.22   4.09
#> 6 70422     1     1   14.9     6.62    17   9.86  11.5   14.9    7.59

Note that this is possible thanks to the work we did in standardizing column names from the processing step.

Suppose instad we wanted to look at peaks and troughs of the single-subject data set bp_jhs. We would then call the bp_mag function on our data.

#> # A tibble: 1 x 6
#>      ID Peak_SBP Peak_DBP Trough_SBP Trough_DBP     N
#>   <dbl>    <dbl>    <dbl>      <dbl>      <dbl> <int>
#> 1     1     20.3     15.1       19.7       17.9   222

Here, we notice something different. Because there weren’t ID, VISIT, or WAKE columns, the bp_mag aggregated everything together. This is technically correct, but say we wanted to glean more information by breaking our data down by DATE instead; we would need to include the inc_date = TRUE optional argument to the function.

tail(bp_mag(jhs_proc, inc_date = TRUE))
#> # A tibble: 6 x 7
#> # Groups:   ID [1]
#>      ID DATE       Peak_SBP Peak_DBP Trough_SBP Trough_DBP     N
#>   <dbl> <date>        <dbl>    <dbl>      <dbl>      <dbl> <int>
#> 1     1 2019-07-26      9.8     4.80       6.20       5.2      5
#> 2     1 2019-07-28      5.5     3          4.5        3        4
#> 3     1 2019-07-29      7       2.75       9          2.25     4
#> 4     1 2019-07-30      2       1          2          1        2
#> 5     1 2019-07-31      1       0.5        1          0.5      2
#> 6     1 2019-08-01      0       0          0          0        1

Interpretation: While it may not seem obvious at first glance, the blood pressure magnitude (whether a peak or a trough) is calculated as \(peak = max(BP) - mean(BP)\) and \(trough = mean(BP) - min(BP)\) where BP could correspond to either SBP or DBP. If we manually inspect the data from 2019-07-31 we see that N = 2 measurements and within the bp_jhs data set we see the two measurements are 126 and 128 for SBP and 77, and 76 for DBP. \(\bar{x}_{SBP} = \frac{(126+128)}{2} = 127\) and \(\bar{x}_{DBP} = \frac{(78+76)}{2} = 76.5\) so the respective peak and trough values from our output make sense.


So far, we have processed the original data and ran a couple metrics to get a clearer picture of the variability, now let’s visualize it. Though the processed data can easily be incorporated into other visualization packages or code (such as ggplot which we will demonstrate in the first example below with bp_mag), the following visuals are currently included with the bp package:

Function Visual
bp_hist Blood Pressure Stage Histograms
bp_scatter Blood Pressure Stage Scatter Plot (American Heart Association)
dow_tod_plots Day of Week / Time of Day chart
bp_report Exportable Blood Pressure Report

Continuing with our previous example using the bp_jhs data set, let’s suppose we wanted to explore how to peaks and troughs of systolic blood pressure changed over time. Note that there is a subtle assumption here in that we have multiple measurements for a given day, otherwise a single value will be both the peak and trough; however, the plot still works regardless.

viz_data <- bp_mag(jhs_proc, inc_date = TRUE)
plot(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$DATE, viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$Peak_SBP, type = 'l', col = "red", xlab ="DATE", ylab = "Magnitude")
lines(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),]$DATE, viz_data[which(viz_data$Trough_SBP > 0 & viz_data$N > 1),]$Trough_SBP, col = "darkgreen")
legend("topright", legend = c("Peak", "Trough"), col = c("red","darkgreen"), lty =1)

From the above time series chart, notice that the values are absolute magnitudes for both peak and trough. So, when peak exceeds trough, as was evident during late-May and early-June, the interpretation is that blood pressure rose more on average than it fell. In other words, the variability of the blood pressure data is right-skewed more toward the high end. In contrast, in the very beginning the variability was more left-skewed favoring the low end of the spectrum since there were more troughs than peaks. We can verify this by looking at the very first day of measurements on 2020-04-16 as shown below:

head(viz_data[which(viz_data$Peak_SBP > 0 & viz_data$N > 1),])
#> # A tibble: 6 x 7
#> # Groups:   ID [1]
#>      ID DATE       Peak_SBP Peak_DBP Trough_SBP Trough_DBP     N
#>   <dbl> <date>        <dbl>    <dbl>      <dbl>      <dbl> <int>
#> 1     1 2019-04-16     6.25     8         12.8        6        4
#> 2     1 2019-04-20     5.67     3.33       5.33       3.67     3
#> 3     1 2019-04-21     6        0          6          0        2
#> 4     1 2019-04-22     2        0.5        2          0.5      2
#> 5     1 2019-04-26     6.33     1.67       5.67       2.33     3
#> 6     1 2019-04-27     1.5      3          1.5        3        2

Recall that in the processing stage, there were additional columns that were automatically created. We will now visualize two of these, SBP_Category and DBP_Category, through the bp_hist and bp_scatter functions, and the Time_of_Day column through the dow_tod_plots function.

The bp_hist returns three histograms of all readings corresponding to total number within each stage, frequency of SBP readings, and frequency of DBP readings. Furthermore, it breaks the data down by color according to which blood pressure stage it falls under:


The above plots show a cautiously high frequency of Elevated and Stage 1 readings for SBP, but the frequency of DBP readings seems to fare better in the Normal and Elevated stages.

Let’s now suppose that we wish to break down our readings by Time of Day and Day of Week. For this, we can implement the dow_tod_plots function. Because this function is mainly used as a helper function for the bp_report function, we need to add a couple steps using the gridExtra package

bptable_ex <- dow_tod_plots(jhs_proc)
gridExtra::grid.arrange(bptable_ex[[1]], bptable_ex[[2]], bptable_ex[[3]], bptable_ex[[4]], nrow = 2)

As a final step before returning to our other example, let’s compile everything that we have done so far into a more compact and digestible report that visualizes everything simultaneously. To do so, we will rely on the bp_report function, which will generate a report in PDF (although other formats such as PNG are available):


Suppose now that we turn our attention back to the hypnos_data example where we joined the ARV and CV metrics together. We would now like to visualize these for all of the subjects to see if we can discern any patterns. From the first scatterplot matrix we see that the between-subject values differ and from the second scatterplot matrix we see that there is a very stark contrast between ARV and CV during sleep vs awake.

viz_arv_cv <- dplyr::left_join(arv(hypnos_proc), cv(hypnos_proc))
#> Joining, by = c("ID", "VISIT", "WAKE", "N")
pairs(viz_arv_cv[,4:(ncol(viz_arv_cv)-1)], upper.panel = NULL, col = factor(viz_arv_cv$ID))

pairs(hypnos_proc[,6:11], upper.panel = NULL, col = factor(hypnos_proc$WAKE))

Future Directions

As our understanding of cardiovascular disease continues to grow, this package will remain ongoing project. As such, collaboration is highly encouraged. Corrections to existing metrics, extensions or new method proposals and visualizations, and code optimization are all welcome.

In the short term, the following new features are to be incorporated with the next release of the package:


  1. Mancia, G., Di Rienzo, M., & Parati, G. (1993). Ambulatory blood pressure monitoring use in hypertension research and clinical practice. Hypertension, 21(4), 510-524.

  2. Levitan, E., Kaciroti, N., Oparil, S. et al. Relationships between metrics of visit-to-visit variability of blood pressure. J Hum Hypertens 27, 589–593 (2013). doi: 10.1038/jhh.2013.19

  3. Muntner, Paula,b; Joyce, Carac; Levitan, Emily B.a; Holt, Elizabethd; Shimbo, Daichie; Webber, Larry S.c; Oparil, Suzanneb; Re, Richardf; Krousel-Wood, Maried,g Reproducibility of visit-to-visit variability of blood pressure measured as part of routine clinical care, Journal of Hypertension: December 2011 - Volume 29 - Issue 12 - p 2332-2338 doi: 10.1097/HJH.0b013e32834cf213

  4. O’Brien E, Sheridan J, O’Malley K . Dippers and non-dippers. Lancet 1988; 2: 397.

  5. Ohkubo T, Imai Y, Tsuji I, Nagai K, Watanabe N, Minami N, Kato J, Kikuchi N, Nishiyama A, Aihara A, Sekino M, Satoh H, Hisamichi S. Relation between nocturnal decline in blood pressure and mortality. The Ohasama Study. Am J Hypertens. 1997 Nov;10(11):1201-7. doi: 10.1016/s0895-7061(97)00274-4. PMID: 9397237.

  6. Mena L, Pintos S, Queipo NV, Aizpúrua JA, Maestre G, Sulbarán T. A reliable index for the prognostic significance of blood pressure variability. J Hypertens. 2005 Mar;23(3):505-11. doi: 10.1097/01.hjh.0000160205.81652.5a. PMID: 15716690.

  7. Holt-Lunstad J, Jones BQ, Birmingham W. The influence of close relationships on nocturnal blood pressure dipping. Int J Psychophysiol. 2009 Mar;71(3):211-7. doi: 10.1016/j.ijpsycho.2008.09.008. Epub 2008 Oct 5. PMID: 18930771.

  8. Schwenck J. Riding for Research: A 5,775-mile Cycling Journey Across North America. Harvard Dataverse https://dataverse.harvard.edu/dataverse/r4r

  9. Webb S. AHA 2019 Heart Disease and Stroke Statistics. American College of Cardiology. https://www.acc.org/latest-in-cardiology/ten-points-to-remember/2019/02/15/14/39/aha-2019-heart-disease-and-stroke-statistics

  10. Bilo G, Grillo A, Guida V, Parati G. Morning blood pressure surge: pathophysiology, clinical relevance and therapeutic aspects. Integr Blood Press Control. 2018;11:47-56. Published 2018 May 24. https://doi.org/10.2147/IBPC.S130277

  11. Irina Gaynanova, Naresh Punjabi, Ciprian Crainiceanu, Modeling continuous glucose monitoring (CGM) data during sleep, Biostatistics, 2020. https://doi.org/10.1093/biostatistics/kxaa023