Archeofrag: Tools for Refitting and Stratigraphical Analysis in Archaeology

Sébastien Plutniak


The development version can be installed from GitHub with:

# install.packages("devtools")

For an interactive demonstration, see also the Shiny application.



On the field, archaeologists distinguish between two main categories of facts: layers and archaeological objects. Layers are spatially ordered masses of matter that archaeologists address using the principles of stratigraphy. Layers are considered as containers: they can include the so-called archaeological objects (bones, lithic pieces, pottery sherds, etc.). Whereas most objects have clear boundaries, layers have fuzzy boundaries. In addition, a distinction between two layers does not imply that their contents are also different. The relationships between layers and those between their contents are independent, due to the post-depositionnal processes: two different layers from a sedimentological perspective can contain objects related to the same chronological period; to the contrary, post-depositionnal processes can mix two different sets of objects into a single layer. Consequently, a dialectical study of the containers and their contents is necessary.

The fragments of an object can be concentrated into a single layer or, to the contrary, scattered into two or more layers. Consequently, studying the location and relationships between fragments is worthwhile to assess the relevance of the boundaries between layers. The so-called “refitting analysis” serves this purpose: archaeologists try to identify fragments with matching edges, proving that they were parts of the same initial object, in which they had adjacent locations.

The Archeofrag package includes a set of tools for this purpose. The approach implemented in this package relies on graphs to model the relationships between fragments and layers. Fragments are represented by vertices, which are connected by an edge if they have “connecting” edges. The information about the layer in which the fragments are located is integrated as a vertex attribute. Two main questions are addressed:

  1. measuring the cohesion of the layers and, when applicable, their admixture;
  2. determining the topological properties of the relationships between fragments within a given layer.

Building fragmentation graphs

There are two ways to input data in Archeofrag:

Using empirical data

The package comes with an example data set, called “Liang Abu” (from the name of a cave in Borneo). It contains three tables:

The make_frag_object function builds objects with the class “frag”. The use of this builder is not a requirement of the other Archeofrag functions, however, it ensures that the data are suitable for the next steps of the analysis. The make_cr_graph is then used to build an igraph graph object representing the connection relationships.

Generating artificial data

The other way to get data to use with Archeofrag is the frag.simul.process function to generate artificial cases. The next command generates a situation in which 20 initial objects (corresponding to the “connected components” of a graph) are fragmented into a total of 50 pieces and spread between two layers. The frag.simul.process function has many options but, at this step, we use it in its simplest form.

Manipulating fragmentation graphs

Once we have a fragmentation graph, several Archeofrag functions are convenient for a first examination of the data. First, the frag.graph.plot function generates a visual representation of the graph, here for the Liang Abu data:

frag.graph.plot(abu.g, "layer")

We can then plot the artificial fragmentation graph:

frag.graph.plot(simul.g, "layer")

Note that this artificial graph has two layers, whereas the Liang Abu graph includes fragments from three layers. The frag.get.layers.pair function is a convenient function to extract a pair of layers.

abu.g12 <- frag.get.layers.pair(abu.g, "layer", c("1", "2"))

Let’s now plot the connected fragments from layer 1 and 2 of Liang Abu:

frag.graph.plot(abu.g12, "layer")

In addition, the frag.get.layers enables to extract as much layers as needed, for example the first layer of our artificial graph:

frag.get.layers(simul.g, "layer", sel.layers = "1")
#> $`1`
#> IGRAPH ecd3ff9 UN-- 25 18 -- 
#> + attr: frag_type (g/c), name (v/c), (v/n), layer (v/c)
#> + edges from ecd3ff9 (vertex names):
#>  [1] 1 --2  3 --4  5 --6  7 --8  9 --10 11--12 13--14 15--16 17--18 19--20
#> [11] 7 --21 8 --21 2 --22 8 --23 21--23 18--24 19--25 20--25

Measuring the admixture of two layers

In this section, we are interested in determining numerical values for the cohesion (i.e. self-adhesion) of a layer and, conversely, the admixture of two layers.

The function enables a first appreciation by returning the count of intra-layers and inter-layers relationships., "layer")
#>      1  2
#>   1 18  0
#>   2  0 17

The diagonal of the matrix contain the number of intra-layers relationships and the other values refer to inter-layers relationships. Note that our simulated graph does not has connection relationships between layers 1 and 2. It is time to introduce a new parameter of the frag.simul.process, namely its disturbance argument. It determines the proportion of fragments which will be randomly displaced from a layer to the other one. The default value of this parameter is 0, what explains why our simul.g graph has no inter-layers relationships. Let’s create a new artificial graph:

simul2.g <-frag.simul.process(n.components=20, vertices=50, disturbance=.1), "layer")
#>      1  2
#>   1 14  5
#>   2  5 14

As expected, this graph has inter-layers connections.

In addition, the frag.get.parameters function reports general information about a fragmentation graph:

frag.get.parameters(abu.g12, "layer")
#> $n.components
#> [1] 28
#> $vertices
#> [1] 72
#> $edges
#> [1] 52
#> $balance
#> [1] 0.3188406
#> $components.balance
#> [1] 0.29
#> $disturbance
#> [1] 0.04166667
#> $aggreg.factor
#> [1] 0.7031073
#> $planar
#> [1] TRUE

Compare, for example, the properties of the two simulated graphs:

  frag.get.parameters(simul.g, "layer"),
  frag.get.parameters(simul2.g, "layer")
#>                    [,1]     [,2]     
#> n.components       20       20       
#> vertices           50       50       
#> edges              35       33       
#> balance            0.5      0.4888889
#> components.balance 0.5      0.47     
#> disturbance        0        0.06     
#> aggreg.factor      0.527864 0.4666934
#> planar             TRUE     TRUE

The Archeofrag package also implements a more sophisticated method to measure the cohesion and admixture of two layers. The first step is to weight the edges of the graph. This is done with the frag.edges.weighting function:

simul.g <- frag.edges.weighting(simul.g, "layer")
#>  [1] 2.154776 1.436517 1.436517 6.856106 6.856106 7.757194 1.436517 1.436517
#>  [9] 1.436517 1.436517 2.154776 5.746069 5.746069 5.746069 6.856106 6.856106
#> [17] 2.154776 2.154776 1.430406 2.145608 5.721623 5.721623 5.721623 1.430406
#> [25] 1.430406 2.145608 2.145608 1.430406 1.430406 5.721623 5.721623 5.721623
#> [33] 1.430406 2.860811 2.145608

Note that the weighting of the edges is mandatory, otherwise an error is raised.

Then, the frag.layers.cohesion function is used to calculate the cohesion value of each layer:

frag.layers.cohesion(simul.g, "layer")
#>         1         2 
#> 0.5439696 0.4560304

Compare with the second artificial graph:

simul2.g <- frag.edges.weighting(simul2.g, "layer")
frag.layers.cohesion(simul2.g, "layer")
#>         1         2 
#> 0.4772866 0.4666964

These values tell how much each layer is cohesive (self-adhesive).

In complement, the frag.layers.admixture function returns a value quantifying the mixture between two layers:

frag.layers.admixture(simul.g, "layer")
#> [1] 0
frag.layers.admixture(simul2.g, "layer")
#> [1] 0.05601695

Characterising fragmentation within a layer

The second dimension of the method implemented in Archaeofrag is about characterising the layers based on the topological properties of the relationships between the fragments they contain. Several functions are provided for this purpose. The archaeological interpretation of the numerical values depends on the type of material (lithic, pottery, etc.) and the completeness or incompleteness of the object.

Let’s start by using igraph’s function to create a graph for each layer of our mixed artificial graph:

simul.l1.g <- induced_subgraph(simul2.g, V(simul2.g)[V(simul2.g)$layer==1])
simul.l2.g <- induced_subgraph(simul2.g, V(simul2.g)[V(simul2.g)$layer==2])

In a graph, a cycle is a path in which the only repeated vertices are the first and last vertices. The frag.cycles function search for cycles in a graph and returns the number of cycles found for different cycle lengths. The kmax parameter determines the maximal length of the cycles to look for. Let’s compare the cycles found in two layers of the artificial graph:

frag.cycles(simul.l1.g, kmax=5)
#> 3-cycles 4-cycles 5-cycles 
#>        1        0        0
frag.cycles(simul.l2.g, kmax=5)
#> 3-cycles 4-cycles 5-cycles 
#>        1        0        0

The frag.path.lengths function the distribution of the path lengths in the graph. It returns a vector whose first first element is the frequency of the paths of length 1, the second element is the frequency of the paths of length 2, etc. If the cumulative parameter is set to TRUE, the function returns the cumulative relative frequency of the path lengths.

#> [1] 14  4
frag.path.lengths(simul.l2.g, cumulative=T)
#> [1] 1.0000000 0.1428571

In a graph, the shortest path between two vertices is the path including the less number of edges. The diameter of a graph is its longest shortest path. The frag.diameters function calculates the diameter of each component of the graph and returns the frequency of the values. If the cumulative parameter is set to TRUE, the function returns the cumulative relative frequency of the diameters.

#> 1 2 
#> 8 3
#>  1  2 
#> 11  2

More on artificial graphs


The frag.simul.process function offers several other options to control the features of the simulation.

initial.layers is a crucial parameter. It determines the method used to construct the graph and, accordingly, the underlying formation process hypothesis. It takes as value 2 (default value) or 1. If the value is 2, then

  1. the initial objects (components) are placed in two different layers,
  2. these objects are fragmented,
  3. some fragments are moved as determined by the value disturbance parameter.

If the value of initial.layersis 1, then

  1. the initial objects are considered to belong to a unique layer,
  2. the fragmentation process is applied,
  3. layers are attributed to the fragments,
  4. disturbance is applied.

The vertices and edges parameters are related: at least one of them must be set, or both (only if initial.layers is set to 1). Note that using both parameters at the same time increases the constraints and reduces the number of possible solutions to generate the graph. When there is no solution, an error is raised and a message suggests how to change the parameters.

The balance argument determines the number of fragments in the smaller layer (before the application of the disturbance process). The components.balance also determines the contents of the two layers by affecting the distribution of the initial objects (components). Note that this argument is used only when initial.layers is set to 2.

The aggreg.factor parameter affects the distribution of the sizes of the components: this distribution tends to be more unequal when aggreg.factor has values close to 1.

Finally, the planar argument determines if the graph generated has to be planar or not (a graph is planar when it can be drawn on a plane, without edges crossing). An example of a complete setting of the function is:

Hypotheses testing

Two methods are possible (set by the initial.layers parameter) to enable to test empirical results against simulated results generated based from two different hypotheses. The most likely scenario is identified from the simulated results closer to the empirical results.

Note that the frag.get.parameters returns a list whose elements are named with the parameters names, what is a handy way to set the simulator.

As much graphs with these settings can be generated, to compare the distribution of their properties with the empirical values. The simulator is embedded into a function with its parameters set for the “two initial layers” hypothesis:

The function is then run a sufficient number of times:

We can then compare the distribution of, for instance, the number of edges to the empirical value observed at Liang Abu (red line):