1 Introduction

The geometr package provides tools that generate and process easily accessible and tidy geometric shapes (of class geom). Moreover, geometr aims to improve interoperability of geometric classes. One could argue that spatial classes are merely a special case of geometric classes, where the points’ coordinates refer to real locations on the surface of the earth, specified in further detail by the coordinate reference system (crs). For ordinary geometric shapes (such as squares or circles), the coordinate (reference) system is the cartesian coordinate system. geometr makes the generalisation to treat all geometric and spatial classes in the same way, and thus both of them are termed geometric objects/classes here.

Geometric classes contain typically a collection of points that outline the geometric shapes or features. A feature in geometr is defined as a set of points that form no more than one single unit of a given feature type (point, line and polygon) and, in contrast to the simple features standard, there are no multi-* features. Sets of geometric objects that belong together beyond their geometric connectedness are assigned a common group, that can have its own group attributes (more on this in the chapter Attributes of a geom). Features are characterised by a location, some coordinate (reference) system, and various other properties or metadata. Most geometric classes are conceptually quite similar, yet a common, interoperable standard lacks for accessing and modifying features, their points or the metadata.

This vignette outlines in detail first how geometr improves interoperability, then it describes the datastructure of a geom (the geometric class that comes with geometr), how different feature types are cast into one another, shows how to visualise geometric objects and eventually gives a short introduction of the tools that come with this first version of geometr.

2 Interoperability

Interoperable software can easily exchange information with other software, which can be achieved by providing the output of functionally similar operations in a common arrangement or format. This principle is not only true for software written in different languages, but can also apply to several packages within the R ecosystem. R is an open source environment which means that no single package or class will ever be the sole source of a particular datastructure and this is also the case for spatial and other geometric data.

Interoperable data is data that has a common arrangement and that uses terms from the same ontology, resulting ideally in semantic interoperability. As an example, we can think of the extent of a geometric object. An extent reports the minimum and maximum value of all dimensions an object resides in. There are, however, several ways in which even this simple information can be reported, for example as vector or as table and with or without names. Moreover, distinct workflows provide data so that the same information is not at the same location or with the same name in all structures, e.g., the minimum value of the x dimension is not always the first information and is not always called ‘xmin’.

The following code chunk exemplifies this by showing various functions, which are all considered standard in R to date, that derive an extent from specific spatial objects:

st_bbox() provides the information as a named vector and presents first minimum and then maximum values of both dimensions, bbox() provides a table with minimum and maximum values in columns and extent() provides the information in an S4 object that presents first the x and then the y values. Neither data structures, nor names or positions of the information are comparable.

For a human user the structure of those information might not matter, because we recognise, in most cases intuitively, where which information is to be found in a datastructure. In the above case it is easy to recognise how the combination of column and rownames (of bbox()) refers to the already combined names (of st_bbox() or extent()). However, this capacity of humans to recognise information relative to the context needs to be programmed into software, for it to have that ability. Think, for example, of a new custom function that is designed to extract and process information from an arbitrary spatial input, i.e., without knowing in advance what spatial class the user will provide. This would require an extensive code-logic to handle all possible input formats, complicated further by classes that may become available only in the future.

geometr improves interoperability in R for geometric and thus spatial classes by following the Bioconductor standard for S4 classes. Here, getters and setters are used as accessor functions, and as pathway to extract or modify information of a given data structure. geometr thus provides getters that provide information in identical arrangement from a wide range of classes, and likewise setters that modify different classes in the same way, despite those classes typically need differently formatted input, arguments and functions. The following code chunk shows how different input classes yield the same output object.

The output of the getters provided by geometr is tidy, i.e., it provides variables in columns and observations in rows, and it is interoperable, i.e., it provides the same information in the same location of the output object, with the same names. This ensures, amongst other advantages, that a custom function that processes geometric information, requires merely one very simple row of code to extract those information from a potentially wide range of distinct classes.

3 Description of the class geom

geometr comes with the S4 class geom. geom is a geometric (spatial) class that has been primarily developed for its interoperability and easy access.

This means also here that all objects of this class are structurally the same, that no slots are removed or added when modifying an object and that all properties are labelled with the same terms in each object of that class. This interoperability is true for objects representing point, line or polygon features, for objects that contain a single or several features and for objects that are either merely geometric or indeed spatial/geographic because they contain a coordinate reference system. A geom contains, moreover, only direct information, i.e., such information that can’t be derived from other of its information. A prominent example is the extent, which is not stored within a geom but within many other spatial classes (in R), and which can very simply be derived from the coordinate values of the points that make up the geometry.

3.1 Create a geom

A geom can be created simply by transforming it from another class (that is, any class for which a method has been defined), or by using one of the geometry shape functions that are labelled gs_* in geometr.

From these examples we learn something more about objects of class geom. nc_geom is made up of 108 polygon features (with 2529 points), has a coordinate reference system (crs) and a set of (feature) attributes. The attributes’ values are not shown by the print method of a geom, which is a more compact visualisation of the important information. Moreover, there is a “tiny map” that shows where the points of the respective geom are concentrated, which gives a rough but quick overview of the shape of the object. If there is less than 1/16th of all points in a section of the map, a ◌ is shown, for more than 1/16th but less than 1/8th this is ○, for more than 1/8th but less than 1/4th ◎ and for sections with more than 1/4th of points, this is ◉.

aPoly is only made up of one feature with 5 points and a cartesian coordinate system. As a matter of fact, any geom that has no crs assigned is assumed to be a mere geometric object of which the values are valid for a cartesian coordinate system.

3.2 How are polygons handled?

You might wonder why it shows 5 points for aPoly, while only 4 have been defined. This is due to how polygons are stored in a geom. A polygon is by definition a two-dimensional plane, in contrast to a line that has only one dimension, its length, and a point, which is dimensionless. A polygon and a line can be made up of the same points and a polygon is indeed nothing more than a sequence of lines (a path) that outlines the shape of the polygon. To then distinguish a line and polygon with the same points, it can be defined that a polygon must have duplicate start and end points, which would constitute a closed path.

Polygons may also have holes (and islands therein), for example a park with a pond that has a little island in the middle. Such cases are of course also possible with a geom and the only thing to consider is that the outer (closed) ring must be given as first ring. All rings that are supposed to be nested within this ring must themselves be closed paths, but their order does not matter. Moreover, when building a polygon with hole in geometr, the rotation direction described by the sequence of the points does not matter. Whether part of a polygon is “inside”, and thus whether a closed path describes a hole or not, is determined by the code-logic of the functions processing polygons.