Introducing stplanr

Robin Lovelace

2017-11-23

Introduction

stplanr was developed to solve a real world problem: how to convert official data on travel behaviour into geographic objects that can be plotted on a map and analysed using methods from geographical information systems (GIS)? Specifically, we wanted to visualise and investigate the spatial distribution origin-destination (OD) data such as the open datasets provided by the UK Data Services WICID portal (see wicid.ukdataservice.ac.uk/) to explore cycling potential (Lovelace et al. 2017). Since this basic functionality has been implemented (with the function od2line()) there have been many further developments in stplanr, some of which are described in a longer vignette. The purpose of this vignette is to get you up-to-speed with the basics and provide useful links for doing transport research with R.

Installing stplanr

If you’re new to programming and transport data, we recommend using stplanr interactively in an Integrated Development Environment (IDE) such as RStudio to make life easier. Steps to set-up a suitable R/RStudio environment are described in sections 2.3 and 2.5 of the book Efficient R Programming (Gillespie and Lovelace 2016).

Once you have an R set-up you are happy with, the latest version can be installed from CRAN in the usual way:

install.packages("stplanr")

The development version can be installed with the devtools package as follows:

devtools::install_github("ropensci/stplanr")

Load the package as follows:

library(stplanr)

OD data to desire lines and routes

Transport data can take many forms. R is an appropriate language for handling transport data, as it can read-in data in such a wide range of formats, e.g. with packages such as haven and foreign. This section focusses on OD datasets, and their conversion to desire lines and routes because these are foundational data types for many transport research applications. (stplanr also contains functions for: the analysis of road traffic casualty data, interfacing with various routing APIs, ‘travel watershed’ analyis and access to Google’s Travel Matrix API.)

Origin-destination (OD) data is simply data in the following form:

od_eg = read.csv(text = 
  "origin, destination, V1, V2
  1, 2, 100, 3
  1, 3, 50, 5"
  )
knitr::kable(od_eg)
origin destination V1 V2
1 2 100 3
1 3 50 5

What this example OD table means is that 100 units of ‘V1’ and 3 units of V2 travel between zone 1 and zone 2. There is also movement represented between Zone 2 and 3.

This dataset can also be represent as an ‘od matrix’, where rows represent the origins and columns destinations. However, for multiple variables (e.g. modes of transport) and to prevent giant and unwieldy sparse matrices, the ‘long’ form represented above is much more common.

Now, imagine that V1 represents the total number of people travelling between the origin and destination and that V2 represents the number who regularly cycle. From this we can get a good indication of where people cycle at the desire line level. (Note: a good source of open OD data has been made available from the wicid.ukdataservice.ac.uk website).

To extract useful information from this OD dataset, we need to be able to place the lines on the map. What kind of place does a desire line originate from? What about the destination? What is the environment like that it passes through? To answer all these questions we need a geographic representation of the OD table illustrated above.

Converting OD data to desire lines with R

One problem with OD data is that the rows do not tend to have geography inherently built in. They could contain a variables called lat_origin, lon_origin, lat_destination and lon_destination. But generally they only contain the IDs of geographic zones.

Work is needed to convert the OD data into ‘desire lines’. Desire lines are straight lines between the origin and destination and represent where people would go if they were not constrained by the route network (see Figure 3 from this paper).

To show how these desire lines are created, we’ll switch to using real OD data provided by stplanr. The first three of these is shown below:

data("flow") # load the 'flow' dataset from the stplanr package
head(flow[c(1:3, 12)])
##        Area.of.residence Area.of.workplace All Bicycle
## 920573         E02002361         E02002361 109       2
## 920575         E02002361         E02002363  38       0
## 920578         E02002361         E02002367  10       0
## 920582         E02002361         E02002371  44       3
## 920587         E02002361         E02002377  34       0
## 920591         E02002361         E02002382   7       0

This shows that, between zone E02002361 and E02002361 (i.e. intrazonal flow) there were 109 people travelling to work by all modes in the 2011 census. 2 of them cycled. The equivalent numbers for the OD pair E02002361 to E02002371 were 44 and 3. But how to make this data geographical?

For that we need another dataset, also provided by stplanr:

data("cents") # load the 'cents' dataset
head(cents)
## class       : SpatialPointsDataFrame 
## features    : 6 
## extent      : -1.550806, -1.511861, 53.8041, 53.82887  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 4
## names       :  geo_code,  MSOA11NM, percent_fem,  avslope 
## min values  : E02002361, Leeds 032,    0.408759, 2.284782 
## max values  : E02002393, Leeds 064,    0.591141, 5.091685

The cents dataset is spatial, illustrated by it’s class: a SpatialPointsDataFrame, from the sp package. Because stplanr loads sp, the dataset will be plotted as a map by default, as illustrated below:

library(tmap)
tmap_mode("view")
## tmap mode set to interactive viewing
qtm(cents, symbols.size = 5)

stplanr creates desire lines using the od2line() function, which links geographical and non-geographical datasets together. In this case, it will join the non-geographical flow data with the geographical cents data plotted above. Let’s take a single OD pair, E02002361 to E02002371, the fourth row represented in the table above, to see how this works:

flow_single_line = flow[4,] # select only the first line
desire_line_single = od2line(flow = flow_single_line, zones = cents)

This can be plotted as follows:

qtm(desire_line_single, lines.lwd = 5)

Note that the R function od2line() is generic in the sense that it will work the same if you give it a single OD pair or a table representing thousands of desire lines. The following command creates desire lines for all OD pairs stored — omitting ‘internal flows’ via the sel object belwo — represented in the dataset flowlines:

l = od2line(flow = flow, zones = cents)
# remove 'internal flows'
sel = l$Area.of.residence == l$Area.of.workplace
l = l[!sel, ]

This creates the geographic data object l, which can be visualised as follows:

qtm(l)

Now the data is set-up, we can change the visual appearance of the desire lines with a single extra argument passed to the plotting function. Let’s make width depend on the total number of people travelling along the desire line:

qtm(l, lines.lwd = "All", scale = 10)
## Legend for line widths not available in view mode.

Another useful visulisation involves setting the colour relative to the number of people cycling:

tm_shape(l) + tm_lines(lwd = "All", scale = 10, col = "Bicycle")
## Legend for line widths not available in view mode.

Finally, we can convert these desire lines into routes as follows (other route_* functions can be used, but may require API keys to work - see ?route_cyclestreets for details):

# if the next line returns FALSE the code will not run
curl::has_internet() 
## [1] TRUE
r = line2route(l, route_fun = route_osrm)

These routes contain the same information on origin and destination, but have additional spatial information about the route network. The routes can be plotted in the same way as the desire lines were plotted:

r@data = cbind(r@data, l@data)
tm_shape(r) + tm_lines(lwd = "All", scale = 10, col = "Bicycle")
## Legend for line widths not available in view mode.

The next stage is to aggregate these lines together to create a ‘route network’. This, and many other functions, are described in the stplanr-paper vignette.

Context and discussion

This section outlines some of the wider motivations underlying the package.

As settlements worldwide have grown and become more complex, the process of planning has had to adapt. Planners today are specialists, in sub-fields such as Emergency, Logistics, Healthcare, Urban and Transport Planning. And the ‘art’ of planning has become more of a science, with its own array of specialist hardware and software.

The process of Transport Planning has undergone a particularly dramatic revolution. Transport interventions such as new bridges, ports and active travel routes are no longer decided based on the intuition of public sector or political authorities. Decisions are now the result of a long socio-technical process involving public consultation, cost-benefit analyses and computer modelling and visualisation. With the ongoing digital revolution, the importance of this last stage has grown, to the point where transport planning is now a highly technical process, employing dozens of software developers in large planning organisations. There is now a multi-billion pound global transport planning consultancy industry, to support the decision-making process. Yet the fruits of all this labour are unavailable to the vast majority of citizens worldwide, and transport planning decisions which go against the best available evidence keep getting made.

This is the context which motivated the development of stplanr. Its aim is simple: to provide an accessible toolbox for transport planning. It is hoped that it will be useful for practitioners and researchers alike as part of the transition to open source software taking place in the tech industry, which is gradually filtering down into other sectors of the economy, notably ‘Big Data’ in consultancies.

A further motivation is that the best available evidence suggests the future of civilisation depends on our ability to transition away from fossile fuels. The transport sector is the fastest growing source of emissions by sector, and represents a major roadblock in the path towards a zero-carbon economy. Transport systems are also a major cause of ill health, by enabling sedentary lifestyles and causing numerous road traffic casualties. Knowledge of these impacts motivated the word ‘sustainable’ in the package’s name: by focussing on active travel and public transport modes, stplanr encourages the design of transport interventions that reduce dependence on fossil fuels.

Contributing

We welcome your contributions, whether it’s filing a bug or feature request in the issue tracker, putting in a pull request to improve performance or documentation, or simply letting us know how you’re using stplanr in your work by citing it or dropping us an email.

References

Bivand, Roger S, Edzer J Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. Vol. 747248717. Springer.

Gillespie, Colin, and Robin Lovelace. 2016. Efficient R Programming: A Practical Guide to Smarter Programming. O’Reilly Media. https://csgillespie.github.io/efficientR/.

Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). doi:10.5198/jtlu.2016.862.