TL;DR

First, pick an outlet location and download some data.

# Uncomment to install!
# install.packages("nhdplusTools")

library(nhdplusTools)
#> USGS Support Package: https://owi.usgs.gov/R/packages.html#support
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.1, PROJ 5.2.0

start_point <- st_sfc(st_point(c(-89.362239, 43.090266)), crs = 4269)
start_comid <- discover_nhdplus_id(start_point)

flowline <- navigate_nldi(list(featureSource = "comid", 
                               featureID = start_comid), 
                          mode = "upstreamTributaries", 
                          data_source = "")

subset_gpkg <-subset_nhdplus(comids = flowline$nhdplus_comid,
                             output_file = tempfile(fileext = ".gpkg"),
                             nhdplus_data = "download")
#> All intersections performed in latitude/longitude.
#> Reading NHDFlowline_Network
#> Writing NHDFlowline_Network
#> Reading CatchmentSP
#> Writing CatchmentSP

flowline <- sf::read_sf(subset_gpkg, "NHDFlowline_Network")
catchment <- sf::read_sf(subset_gpkg, "CatchmentSP")
waterbody <- sf::read_sf(subset_gpkg, "NHDWaterbody")

plot(sf::st_geometry(flowline), col = "blue")
plot(start_point, cex = 1.5, lwd = 2, col = "red", add = TRUE)
plot(sf::st_geometry(catchment), add = TRUE)
plot(sf::st_geometry(waterbody), col = rgb(0, 0, 1, alpha = 0.5), add = TRUE)

plot of chunk tldr

Read on to see how NHDPlusTools will help you index data to the network you just retrieved and refactor (split, collapse, and aggregate) the catchments into a different set of catchments. Please consider registering issues and feature suggestions on github.

Introduction

The nhdplusTools package is intended to provide a reusable set of tools to subset, relate data to, and refactor (collapse, split, and aggregate) NHDPlus data. It implements a data model consistent with both the NHDPlus and HY_Features. The package aims to provide a set of tools with minimal dependencies that can be used to build workflows using NHDPlus data.

The package has three types of functionality:

  1. Discovery and Subsetting
  2. Indexing or Referencing
  3. Generalization and Refactoring

This introduction gives an overview of the basic package setup and an brief demonstration of the three types of functionality. Detailed documentation of all the package functions can be found at the Referece page

Installation

The easiest way to install nhdplusTools is with the devtools package like this:

# install.packages("devtools")
# devtools::install_github("usgs-r/nhdplusTools")

Then you can load up nhdplusTools:

library(nhdplusTools)

Data and Package Basics

The first thing you are going to need to do is go get some data to work with. nhdplusTools provides the ability to download small subsets of the NHDPlus as described in the Discovery and Subsetting section. For large subsets, greater than a few thousand square kilometers, you can download the National Seamless database at this web page. You will need 7z or the archive package to extract it.

If you are working with the whole National Seamless database, nhdplusTools has some convenience functions you should be aware of. Once you have it downloaded and extracted, you can tell the nhdplusTools package where it is with the nhdplus_path() function.

nhdplus_path(file.path(temp_dir, "natseamless.gpkg"))

nhdplus_path()
#> [1] "/var/folders/p6/kr6wq6bj7wvdz1qywzhksz5h001t8d/T//RtmpG864Ir/natseamless.gpkg"

If you are going to be loading and reloading the flowlines, flowline attributes, or catchments, repeatadly, the stage_national_data() function will speed things up a bit. It creates three staged files that are quicker for R to read at the path you tell it. If you call it and its output files exist, it won't overwrite and just return the paths to your staged files.

staged_data <- stage_national_data(output_path = tempdir())

str(staged_data)
#> List of 3
#>  $ attributes: chr "/var/folders/p6/kr6wq6bj7wvdz1qywzhksz5h001t8d/T//RtmpG864Ir/nhdplus_flowline_attributes.rds"
#>  $ flowline  : chr "/var/folders/p6/kr6wq6bj7wvdz1qywzhksz5h001t8d/T//RtmpG864Ir/nhdplus_flowline.rds"
#>  $ catchment : chr "/var/folders/p6/kr6wq6bj7wvdz1qywzhksz5h001t8d/T//RtmpG864Ir/nhdplus_catchment.rds"

As you can see, stage_national_data() assumes you want to stage data in the same folder as the nhdplus_path database and returns a list of .rds files that can be read with readRDS. The flowlines and catchments are sf data.frames and attributes is a plain data.frame with the attributes from flowline. Note that this introduction uses a small subset of the national seamless database as shown in the plot.

flowline <- readRDS(staged_data$flowline)
names(flowline)[1:10]
#>  [1] "COMID"      "FDATE"      "RESOLUTION" "GNIS_ID"    "GNIS_NAME" 
#>  [6] "LENGTHKM"   "REACHCODE"  "FLOWDIR"    "WBAREACOMI" "FTYPE"

library(sf)
plot(sf::st_geometry(flowline))

plot of chunk staged_data

NHDPlus HiRes

(6/16/2019) NHDPlus HiRes is an in-development dataset that introduces much more dense flowlines and catchments. In the long run, nhdplusTools will have complete support for NHDPlus HiRes. So far, nhdplusTools will help download and interface NHDPlus HiRes data with existing nhdplusTools functionality. It's important to note that nhdplusTools was primarily implemented using NHDPlusV2 and any use of HiRes (which is still “beta data” as of writing this) should be subject to significant scruitiny. Never the less, here's a short summary of how to work with NHDPlus HiRes.

For the demo below, a small sample of HiRes data that has been loaded into nhdplusTools is used. The first line shows how you can download additional data (just change download_files to TRUE).

download_nhdplushr(nhd_dir = "download_dir", 
                   hu_list = c("0101"), # can mix hu02 and hu04 codes.
                   download_files = FALSE) # TRUE will download files.
#> [1] "https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHDPlusHR/Beta/GDB/NHDPLUS_H_0101_HU4_GDB.zip"

hr_data <- get_nhdplushr(nhd_dir, 
                         out_gpkg = file.path(nhd_dir, "nhd_hr.gpkg"))
(layers <- st_layers(hr_data))
#> Driver: GPKG 
#> Available layers:
#>         layer_name geometry_type features fields
#> 1      NHDFlowline   Line String     2691     57
#> 2 NHDPlusCatchment Multi Polygon     2603      7
unlink(hr_data)

hr_data <- get_nhdplushr(nhd_dir, 
                         out_gpkg = file.path(nhd_dir, "nhd_hr.gpkg"), 
                         layers = NULL)
(layers <- st_layers(hr_data))
#> Driver: GPKG 
#> Available layers:
#>         layer_name geometry_type features fields
#> 1      NHDFlowline   Line String     2691     57
#> 2 NHDPlusCatchment Multi Polygon     2603      7
#> 3     NHDWaterbody       Polygon     1044     15
#> 4          NHDArea       Polygon       10     14
#> 5          NHDLine   Line String      142     12
#> 6      NHDPlusSink         Point        1     10
#> 7         NHDPoint      3D Point        7     10

Other functionality in the package, such as the get_UT/UM/DM/DD functions, subsetting, indexing, etc. also work now or will soon! Stay tuned for a dedicated NHDPlus HiRes vignette and submit issues as you find them!

Discovery and Subsetting

One of the primary workflows nhdplusTools is designed to accomplish can be described in three steps:

  1. what NHDPlus catchment is at the outlet of a watershed,
  2. figure out what catchments are up or downstream of that catchment, and
  3. create a stand alone subset for that collection of catchments.

Say we want to get a subset of the NHDPlus upstream of a given location. We can start with discover_nhdplus_id() First, let's look at a given point location. Then see where it is relative to our flowlines.

lon <- -89.362239
lat <- 43.090266

start_point <- sf::st_sfc(sf::st_point(c(lon, lat)),
                          crs = 4269)

plot(sf::st_geometry(flowline))
plot(start_point, cex = 1.5, lwd = 2, col = "red", add = TRUE)

plot of chunk point

OK, so we have a point location near a river and we want to figure out what catchment is at its outlet. We can use the discover_nhdplus_id() function which calls out to a web service and returns an NHDPlus catchment identifier, typically called a COMID.

start_comid <- discover_nhdplus_id(start_point)
start_comid
#> [1] 13293750

If you have the whole National Seamless database and want to work at regional to national scales, skip down the the Local Data Subsetting section.

Web Service Data Subsetting

nhdplusTools supports discovery and data subsetting using web services made available through the Network Linked Data Index (NLDI) and the National Water Census Geoserver. The code below shows how to use the NLDI functions to build a dataset upstream of our start_comid that we found above.

The NLDI can be queried with any set of watershed outlet locations that it has in its index. We call these “featureSources”. We can query the NLDI for an identifier of a given feature from any of its “featureSources” and find out what our navigation options are as shown below.

discover_nldi_sources()$source
#> [1] "comid"     "huc12pp"   "npdes_rad" "nwissite"  "wqp"

nldi_feature <- list(featureSource = "comid", featureID = start_comid)
discover_nldi_navigation(nldi_feature)
#> $upstreamMain
#> [1] "https://cida.usgs.gov/nldi/comid/13293750/navigate/UM"
#> 
#> $upstreamTributaries
#> [1] "https://cida.usgs.gov/nldi/comid/13293750/navigate/UT"
#> 
#> $downstreamMain
#> [1] "https://cida.usgs.gov/nldi/comid/13293750/navigate/DM"
#> 
#> $downstreamDiversions
#> [1] "https://cida.usgs.gov/nldi/comid/13293750/navigate/DD"

The discover_nldi_navigation function is a handy way to make sure the featureID is available for the chosen “featureSource” as well as find valid navigation modes to be used with navigate_nldi. Now that we know the NLDI has our comid, we can use the “upstreamTributaries” navigation option to get all the flowlines upstream or all the features from any of the “featureSources” as shown below.


flowline_nldi <- navigate_nldi(nldi_feature, 
                               mode = "upstreamTributaries", 
                               data_source = "")

plot(sf::st_geometry(flowline), lwd = 3, col = "black")
plot(sf::st_geometry(flowline_nldi), lwd = 1, col = "red", add = TRUE)

plot of chunk navigate_nldi

What is not shown here is that the NLDI only provided geometry and a comid for each of the flowlines. The subset_nhdplus function has a “download” option that allows us to download four layers and all attributes as shown below.

output_file_download <- file.path(temp_dir, "subset_download.gpkg")

output_file_download <-subset_nhdplus(comids = flowline_nldi$nhdplus_comid,
                             output_file = output_file_download,
                             nhdplus_data = "download")
#> All intersections performed in latitude/longitude.
#> Reading NHDFlowline_Network
#> Writing NHDFlowline_Network
#> Reading CatchmentSP
#> Writing CatchmentSP

sf::st_layers(output_file_download)
#> Driver: GPKG 
#> Available layers:
#>            layer_name        geometry_type features fields
#> 1 NHDFlowline_Network 3D Multi Line String      168     91
#> 2         CatchmentSP        Multi Polygon      167      8
#> 3             NHDArea        Multi Polygon        1     16
#> 4        NHDWaterbody        Multi Polygon       90     23

flowline_download <- sf::read_sf(file.path(temp_dir, "subset_download.gpkg"), 
                                 "NHDFlowline_Network")

plot(sf::st_geometry(dplyr::filter(flowline_download, 
                                   streamorde > 2)), 
     lwd = 7, col = "darkgrey")
plot(sf::st_geometry(flowline_nldi), 
     lwd = 3, col = "red", add = TRUE)

plot of chunk subset_nhdplus_download

This plot illustrates the kind of thing that's possible (filtering to specific stream orders) using the attributes that are downloaded.

Notice that the data downloaded above only has four layers where the subset we build below has more. This functionality should be considered beta in nature, but may be useful for some applications so has been included.

Before moving on, one more demonstration of what can be done using the NLDI. Say we knew the USGS gage ID that we want NHDPlus data upstream of. We can use the NLDI to navigate from the gage the same as we did our comid. We can also get back all the nwis sites the NLDI knows about upstream of the one we chose!

nldi_feature <- list(featureSource = "nwissite", featureID = "USGS-05428500")

flowline_nldi <- navigate_nldi(nldi_feature, 
                               mode = "upstreamTributaries", 
                               data_source = "")

output_file_nwis <- file.path(temp_dir, "subset_download_nwis.gpkg")

output_file_nwis <-subset_nhdplus(comids = flowline_nldi$nhdplus_comid,
                                  output_file = output_file_nwis,
                                  nhdplus_data = "download")
#> All intersections performed in latitude/longitude.
#> Reading NHDFlowline_Network
#> Writing NHDFlowline_Network
#> Reading CatchmentSP
#> Writing CatchmentSP

sf::st_layers(output_file_download)
#> Driver: GPKG 
#> Available layers:
#>            layer_name        geometry_type features fields
#> 1 NHDFlowline_Network 3D Multi Line String      168     91
#> 2         CatchmentSP        Multi Polygon      167      8
#> 3             NHDArea        Multi Polygon        1     16
#> 4        NHDWaterbody        Multi Polygon       90     23

flowline_nwis <- sf::read_sf(output_file_nwis, 
                                 "NHDFlowline_Network")

upstream_nwis <- navigate_nldi(nldi_feature,
                               mode = "upstreamTributaries",
                               data_source = "nwissite")

plot(sf::st_geometry(flowline_nwis), 
     lwd = 3, col = "blue")
plot(sf::st_geometry(upstream_nwis), 
     cex = 1, lwd = 2, col = "red", add = TRUE)

plot of chunk nldi_nwissite

Local Data Subsetting

With the starting COMID we found with discover_nhdplus_id above, we can use one of the network navigation functions, get_UM, get_UT, get_DM, or get_DD to retrieve a collection of comids along the upstream mainstaem, upstream with tributaries, downstream mainstem, or downstream with diversions network paths. Here we'll use upstream with tributaries.

UT_comids <- get_UT(flowline, start_comid)
UT_comids
#>   [1]  13293750  13293504  13294134  13294128  13294394  13293454  13293430
#>   [8]  13293424  13294110  13293398  13293392  13293388  13293384  13293380
#>  [15]  13293576  13294288  13294284  13293506  13294280  13294290  13294298
#>  [22]  13294304  13294310  13294312  13293696  13293694  13294264  13293676
#>  [29]  13293620  13293612  13293584  13294166  13293554  13293540  13294282
#>  [36]  13293520  13293480  13294132  13293588  13293550  13293574  13293508
#>  [43]  13293530  13293526  13293524  13294138  13293496  13293488  13293484
#>  [50]  13293474  13294118  13293440  13293426  13293458  13294382  13294274
#>  [57]  13293422  13293416  13293390  13293382  13293386  13293376  13293396
#>  [64]  13293394  13293406  13293404  13294268  13294366  13293400  13293432
#>  [71]  13293452  13293456  13293492  13294158  13294286  13293634  13294368
#>  [78]  13294124 937090090 937090091  13293464  13293444  13293446  13293434
#>  [85]  13293542  13294154  13293536  13294292  13294294  13294300  13294308
#>  [92]  13294314  13294272  13294276  13294384  13294278  13294386  13293494
#>  [99]  13294130  13294306  13294184  13293690  13293692  13293586  13293614
#> [106]  13293624  13293678  13293672  13293674  13294176  13294168  13293578
#> [113]  13293564  13293548  13293478  13293476  13293450  13293442  13293518
#> [120]  13293472  13293572  13293568  13293556  13293558  13293552  13293514
#> [127]  13293522  13294144  13293532  13294150  13294148  13294140  13293516
#> [134]  13293502  13293498  13293460  13294122  13293468  13294112  13293512
#> [141]  13293486  13293378  13293462  13293428  13293420  13293412  13293438
#> [148]  13293490  13293436  13294152  13294296  13294270  13302588  13302590
#> [155]  13293688  13293646  13294174  13294178  13293562  13293600  13293448
#> [162]  13294116  13294120  13293570  13294114  13293418  13293410  13293590

If you are familiar with the NHDPlus, you will recognize that now that we have this list of COMIDs, we could go off and do all sorts of things with the various flowline attributes. For now, let's just use the COMID list to filter our fline sf data.frame and plot it with our other layers.

plot(sf::st_geometry(flowline))
plot(start_point, cex = 1.5, lwd = 2, col = "red", add = TRUE)
plot(sf::st_geometry(dplyr::filter(flowline, COMID %in% UT_comids)),
     add=TRUE, col = "red", lwd = 2)

plot of chunk plot_fline_subset

Say you want to save the network subset for later use in R or in some other GIS. The subset_nhdplus() function is your friend. If you have the whole national seamless database downloaded, you can pull out large subsets of it like shown below. If you don't have the whole national seamless, look at the second example in this section.

output_file <- file.path(temp_dir, "subset.gpkg")

output_file <-subset_nhdplus(comids = UT_comids,
                             output_file = output_file,
                             nhdplus_data = nhdplus_path())
#> All intersections performed in latitude/longitude.
#> Reading NHDFlowline_Network
#> Writing NHDFlowline_Network
#> Reading CatchmentSP
#> Writing CatchmentSP
#> Reading Gage
#> Writing Gage
#> Reading Sink
#> No features to write in Sink
#> Reading NHDArea
#> Writing NHDArea
#> Reading NHDWaterbody
#> Writing NHDWaterbody
#> Reading NHDFlowline_NonNetwork
#> Writing NHDFlowline_NonNetwork

sf::st_layers(output_file)
#> Driver: GPKG 
#> Available layers:
#>               layer_name     geometry_type features fields
#> 1    NHDFlowline_Network Multi Line String      168    136
#> 2            CatchmentSP     Multi Polygon      167      6
#> 3                   Gage             Point       33     19
#> 4                NHDArea     Multi Polygon        1     14
#> 5           NHDWaterbody     Multi Polygon       90     21
#> 6 NHDFlowline_NonNetwork Multi Line String       45     12

Now we have an output geopackage that can be used later. It contains the network subset of catchments and flowlines as well as a spatial subset of other laters as shown in the status output above. To complete the demonstration, here are a couple more layers plotted up.

catchment <- sf::read_sf(output_file, "CatchmentSP")
waterbody <- sf::read_sf(output_file, "NHDWaterbody")

plot(sf::st_geometry(flowline))
plot(start_point, cex = 1.5, lwd = 2, col = "red", add = TRUE)
plot(sf::st_geometry(dplyr::filter(flowline, COMID %in% UT_comids)),
     add=TRUE, col = "red", lwd = 2)
plot(sf::st_geometry(catchment), add = TRUE)
plot(sf::st_geometry(waterbody), col = rgb(0, 0, 1, alpha = 0.5), add = TRUE)

plot of chunk plot_result

Indexing

Expect more in this space as nhdplustTools progresses. Right now, one indexing method has been implemented. Using the data above, we can use the get_flowline_index() function to get the comid, reachcode, and measure of our starting point like this.

get_flowline_index(flowline, start_point)
#>      COMID      REACHCODE REACH_meas       offset
#> 1 13293750 07090002007373   53.13861 0.0004206437

get_flowline_index() will work with a list of points too. For demonstration purposes, we can use the gages in our subset from above.

gage <- sf::read_sf(output_file, "Gage")

get_flowline_index(flowline, sf::st_geometry(gage), precision = 10)
#>       COMID      REACHCODE  REACH_meas        offset
#> 1  13293744 07090002007743  28.6505173 0.00003582516
#> 2  13294276 07090002008387  14.8355105 0.00001231939
#> 3  13294264 07090002007650  56.4438696 0.00002092419
#> 4  13293750 07090002007373  42.3672160 0.00005246302
#> 5  13294312 07090002008383   1.5531989 0.00005255738
#> 6  13294264 07090002007650  40.9505131 0.00002540663
#> 7  13294264 07090002007650   2.0475893 0.00002058628
#> 8  13293688 07090002007660 100.0000000 0.02092936324
#> 9  13294300 07090002008379  85.5611861 0.00004163882
#> 10 13293690 07090002007648   0.7642491 0.00001402610
#> 11 13294264 07090002007650  71.4706457 0.00820480751
#> 12 13294264 07090002007650  72.1081966 0.01994262871
#> 13 13294176 07090002007664   6.1514825 0.00004691321
#> 14 13294264 07090002007650  29.9935507 0.01484440458
#> 15 13294290 07090002008374  88.0212984 0.00002798504
#> 16 13294138 07090002007709  73.4347808 0.00004405764
#> 17 13293614 07090002007657 100.0000000 0.01760769899
#> 18 13293486 07090002007724   8.8530113 0.00001242611
#> 19 13293512 07090002007723   5.9867618 0.00001048150
#> 20 13294176 07090002007664  17.0534369 0.00001298149
#> 21 13293892 07090002007769  76.2138002 0.03342495879
#> 22 13293474 07090002007713  72.2322147 0.00003951737
#> 23 13293520 07090002007676  18.1942138 0.00003026372
#> 24 13293876 07090002007627  60.0662615 0.00004452760
#> 25 13293826 07090002007632 100.0000000 0.00475024166
#> 26 13293512 07090002007723  68.1656477 0.00002298130
#> 27 13294264 07090002007650  97.1005808 0.00002457517
#> 28 13294344 07090002007629   6.7739083 0.00004859613
#> 29 13293452 07090002007737   0.0000000 0.00048689525
#> 30 13294344 07090002007629  27.6310290 0.00004144844
#> 31 13293512 07090002007723  67.5924810 0.00003644438
#> 32 13294344 07090002007629   6.7739083 0.00004859613
#> 33 13294288 07090002008236   4.3378700 0.00027538956

For more info about get_flowline_index() see the article vignette("point_indexing") about it or the reference page that describes it.

Refactoring

The NHDPlus tools package has been developed in support of an experimental NHDPlus refactoring workflow to normalize the size of catchments and resolve particular network locations. If this work is of interest, it can be found in the network_refactor branch of this repository.