SEAHORS: Spatial Exploration of ArcHaeological Objects in R Shiny

This paper presents SEAHORS, an R shiny application available as an R package, dedicated to the intra-site spatial analysis of piece-plotted archaeological remains. This open-source script generates 2D and 3D scatter and density plots for archaeological objects located with cartesian coordinates. Many diﬀerent GIS software already exist for this, but they mostly require speciﬁc skills and training to be used and are rarely designed for the particular needs of archaeological applications. The goal of SEAHORS is to make the two and three-dimensional intra-site spatial exploration of archaeological data as user-friendly as possible, in order to give the opportunity to researchers not familiar with GIS and R software to utilise such approaches. SEAHORS has an easily accessible interface and can import data from text and Excel ﬁles (.csv and .xls/xlsx respectively) without preformatting. The application includes functions to concatenate columns and to merge databases,


Introduction
Archaeological excavation is a practice that, by its very nature, destroys the context of the studied material as it is undertaken, preventing any replication of field observations. To minimize such losses, many archaeologists since the 19 th century promoted the recording of 3D coordinates (X, Y and Z) of archaeological objects during excavation (cf. Plutniak 2021a for a review). Using this three-dimensional (3D) spatial data to virtually reconstruct the excavated deposits offers the opportunity to investigate spatial (horizontal and vertical) organizations of plotted archaeological objects. Such exploration allows to attest, confirm or invalidate stratigraphic and spatial observations made in the field, and to generate postexcavation stratigraphies and assemblages (see Discamps et al. 2023 for a recent review of such use).
Following early calls for the development of spatial analyses of archaeological objects (e.g. Laplace & Méroc 1954), this practice was renewed when computer power became increasingly accessible to archaeologists at the end of the 1980s. Software dedicated to spatial analysis were developed in the next decades: Esri ARC/INFO in 1982 (Morehouse 1985); Newplot developed since 1986 (Dibble & McPherron 1988); Paleo III (by Pierre-Alain Gillioz, see application in Chadelle et al. 1996); FrAcTool (Houllier & Arnoux 2001); the adaptation of DataDesk to archaeological applications (Lacrampe-Cuyaubère 1997); QGIS in 2002 (QGIS.org 2023); the web application and R package archeoViz (Plutniak 2023a;Plutniak 2023b). More recently, web applications dedicated to specific archaeological sites were developed, such as Plotit for Combe-Capelle and others sites (R shiny application; https://oldstoneage.com); PyCoCu for Combe-Cullier (Python script; Sécher et al. 2020), Virtual Poeymaü for the Poeymaü Cave (R shiny application; Plutniak 2021b) and an R shiny application for the La Roche à Pierrot site (Couillet et al. 2022). However, using these software for analyzing spatial distributions of archaeological remains can be challenging due to several factors; such as complex interfaces, difficulties in adapting software to specific needs or databases to the required format, lack of user training, cost, or the absence of a support community. This paper presents SEAHORS (Spatial Exploration of ArcHaeological Objects in R Shiny), a free R Shiny open-source application, that allows easy and quick exploration of the spatial distribution of archaeological objects. SEAHORS is available as an R package on CRAN web server (https://cran.rproject.org/package=SEAHORS). A web application is deployed on the shinyapps.io platform and is available here: https://aurelienroyer.shinyapps.io/Seahors/.

SEAHORS overview
SEAHORS is an R shiny application available as an R package. This application is distributed under the General Public License (GPL) license v3.0. R is a free and open-source statistical environment, with a large supportive community (R core team 2022). However, using a programming language can be challenging for untrained archaeologists. The Shiny package offers the possibility to build a point-and-click interactive interface, allowing for a more user-friendly experience . SEAHORS also relies on the Plotly package (Sievert et al. 2017), offering interactive data visualization by displaying information associated with each point directly on the visualized plots. To showcase SEAHORS' main features, spatial data from the Paleolithic site of Cassenade (Dordogne, France; Discamps et al. 2019) are used as example. In addition, a tutorial video is available at https://nakala.fr/10.34847/nkl.3fdd6h8j. At Cassenade, postexcavation analysis combining spatial and taphonomical data allowed the authors to achieve a better understanding of the site's occupation history (Discamps et al. 2019). Spatial distribution of remains accumulated by hominids and carnivores show two superimposed assemblages that could not be distinguished during excavation. We here show how SEAHORS can be used to reproduce such observations. Data import: supported files SEAHORS requires tabular data and supports Excel spreadsheets (.xls/xlsx) and comma-separated values (.csv) formats. We define a "variable" as the information stored in one column of the table, an "object" as information recorded in one row of the table, and "modalities" as the variables' values (i.e. all the forms that a variable can take in the column). Variables can thus be numeric (e.g. XYZ coordinates) or categorical (e.g. stratigraphic unit or type of finds, with variable modalities such as "flint", "fauna" and "other"). SEAHORS requires a table with at least two columns with X, Y, or Z coordinates (for bi-dimensional exploration) or three columns (X, Y, and Z).

Data import: the Load data panel
The Load data panel allows the import of different types of data for spatial exploration. This panel is divided into five subpanels: The first subpanel, Import XYZ data, is the most important and is a mandatory step before spatial exploration. It allows the user to set the four basic data to import, namely the XYZ coordinates and the ID of objects. If the three columns of the XYZ coordinates are named "X", "Y" and "Z", respectively, in the imported data frame, the script will automatically load them. Otherwise the column names can be set up manually. Note that only rows with numeric values for the three X, Y and Z coordinates will be taken into account by SEAHORS. The values of the X, Y and Z coordinates can be inverted (i.e. multiplied by -1) by checking the boxes in front of these data: this is useful, for instance, when Z values were recorded in the field as depth and not altitude. The ID of objects must be unique (i.e. the variable dedicated to identifying the objects has a distinct modality for each object), and a warning message is displayed if that is not the case. Although the absence of unique IDs does not prevent analysis, it is required for some data manipulations, notably to avoid errors while merging the XYZ coordinates with additional data or while modifying point size options. In the bottom part of this subpanel, the user can select the names of the variables corresponding to the excavation year (numeric format), the sectors, the levels, the type of objects, and any other categorical variables to be displayed in the quick selection sidebar on the left ( Figure  1, see below). None of these variables are mandatory, and the names listed as "default names" are only indicative, just allowing these variables to be loaded automatically when the data frame is imported; whatever name is used in the imported table will not affect the program. The box in front of each variable allows the user to select the corresponding column of the imported sheet.
The second subpanel, entitled Merge additional data, offers the possibility to import supplementary data from a second dataset (i.e. a second Excel or text file). To do this, determining unique object IDs in the Import XYZ data subpanel is required. These unique IDs are used to merge the XYZ dataset and the supplementary dataset; it must then be present in the two files. After merging, a report displayed in the bottom part of the subpanel describes the result of the merging by listing: the IDs of objects that are not unique 1) in the XYZ dataset, 2) the IDs of objects not unique in the supplementary dataset, 3) the objects present in the XYZ dataset only (i.e. no object ID correspondence in the supplementary dataset), and 4) the objects from the supplementary dataset not merged because of a lack of correspondence in the XYZ dataset.
The third subpanel, Import orthophoto, allows loading an orthophotographic image that can then be used in the 2D plot and Density plot. The orthophotographic image should be obtained from a georeferenced model using the same excavation grid as the XYZ dataset. The orthophotographic image must be in .TIFF and smaller than 150 MB.
The fourth subpanel is used to Import refit data. Users can either import a separate file that lists refit data or select the appropriate column from the main dataset (the one already imported using the Import XYZ data subpanel). The refit data should be organized as a spreadsheet listing refitted objects (one row per object) with one column for the object's unique ID and one column for the ID of the refit group to which the object belongs. Groups of refitted objects can link two objects or more. For instance, if we consider that three objects (unique IDs 1300, 1502 and 1603) refit together, then they belong to one refit group ID, which we name "refit group 1". The refit database should then consist of 2 columns (unique IDs, refit group ID) with three rows with "1300", "1502" and "1603" in the first column, and in the second column "refit_group_1", "refit_group_1", "refit_group_1".
The last subpanel (Concatenate two columns) gives the possibility to concatenate two columns. This can be very handy, for example, to merge the square values (e.g. C50) and the object number (e.g. 3) to generate unique object IDs (e.g. C50-3).

The "Quick sidebar" panel
The "Quick sidebar" panel allows the user to easily subset the dataset to facilitate visualization, as well as to modify point size, color or shape in the different plots. Modifying the sliders and check-boxes directly modifies the objects that are included in projections. This panel is subdivided into five subpanels easily accessible with buttons ( Figure 2): The first subpanel is dedicated to subset the dataset shown in the plots. Numeric variables will be displayed with sliders and categorical variables as check-box lists; The second subpanel can be used to adjust the point size according to their XYZ coordinates or to categories (i.e. variable modalities); The third subpanel is used to color the points according to the modalities of a variable; The fourth subpanel is used to change the point shapes (circle, triangle, square, diamond) either for all points or for specific variable modalities; The last one offers some additional display options, such as the possibility to change the size of axis labels and tick marks, the name of axis legends, the space between major tick marks as well as the breaks of minor tick marks.

Figure 2:
The Quick sidebar panel and its five subpanels that allow to subset the dataset and modify the point size, color, shape and other display options.

The Table panel
The Table panel is divided into two subpanels: Raw table and Pivot table. The first offers the opportunity to download the data table for the objects that are shown on the projections, including the additional data that might have been merged, as well as the new column(s) that might have been created (cf. 2D plot in "spatial exploration of data" paragraph). The second subpanel allows the user to create pivot tables to count objects according to a variable and its modalities.

Spatial exploration of data
Spatial exploration of data can be carried out using five different types of plots, all directly accessible with one click on the respective tabs at the top: 3D plot, 2D plot, 2D slice, Rotated 2D plot and Density plot. The first allow users to plot the objects in 3D (Figure 3). The second (2D plot) is composed of two sub-tabs allowing users to plot the objects in 2D (Figures 4 and 5) in two modes: an advanced mode offering interactive data visualization, and a simple mode (without interactive data visualization), which is faster and allows better management of projection aspect (notably by defining a precise display ratio between the two axes).
The 2D slice panel offers the option of automatically creating sequential 2D plots after virtually "slicing" the site according to the axis of a third coordinate (X or Y). The user can select the range of the third axis to be used, as well as the desired thickness of the slices. This slicing is automatic and produces the corresponding plots in the two other coordinates (e.g. YZ) for each slice. Users should first define the total range and thickness of the slices and the script will then calculate and display the 2D plot slices. As in the 2D plot tab, a simple mode is accessible by unchecking the box "advanced mode", offering faster projections, but without interactive data visualization.
The rotated 2D plot panel is useful when users suspect that the orientation provided by the original XY grid of the excavation is not optimal for visually exploring the spatial organization of archaeological objects. In this panel, users can modify the angle of the projection (from -180 to 180°) to explore the spatial organization of objects without being constrained by the grid pattern. These new values of X and Y can be loaded in the sidebar panel for further investigations in the 2D plot or 3D plot slides.
Finally, the Density plot panel offers the possibility to plot objects with a gradient color depending on the density of the nearest points. The density of points and the density lines are calculated using a twodimensional Kernel density estimation from the functions 'Ke2D' and 'geom_density_2d' of the package Mass (Ripley et al. 2013). The density curves are obtained using the 'geom_density' function from the ggplot2 package (Wickham et al. 2016).

Additional settings
In the last panel, a few additional settings are available to modify the way plots are displayed: Color ramps can be saved (or loaded) if the user is frequently using the same set of colors according to variable modalities.
Color of refits can also be changed according to the modalities of a variable (and this parameter can similarly be saved or loaded). In the specific case of a refit between two objects that have two different modalities of the chosen variable, the color of the refit will be black in the 2D plot, and gradient in the 3D plot.
The "steps" used for sliders can be modified.
The information displayed while hovering over a point in projections can be customized. New variables can also be created and coded directly in SEAHORS. This is for example useful when the user needs to define new groups of objects (e.g. assemblages), and assign objects to each group on the projections. These new groups can later be used in the quick sidebar panel like any other variable, loading them in the Load data panel. The first step is to go to the record new group subpanel in the Additional settings panel. Then, the user chooses a name for the new group variable and creates it. At this step, data from another variable can be copied, a useful option if the new group consists of a slight modification of an already present variable. The user can create as many new group variables as necessary. When the new group is created, you can select points in 2D plots by using the "box" or "lasso" tools and change their attribution by using the button "Change Group Assignment". The list of points selected is detailed below the latter. This new data can be downloaded using the raw table panel.
Finally, the last subpanel of Additional settings can extract a report in the format of a .html file (using the rmarkdown package, Allaire et al. 2023). This report details the parameters used and includes the data tables and the figures obtained.

Example of spatial distribution at Cassenade
As interpreted by Discamps et al. (2019), the use of spatial distribution involving field observations, faunal and lithic data was crucial to understand Cassenade sequence and disentangling the different archaeological layers. Two distinct assemblages were identified in a single lithostratigraphic layer on the basis of projections of the coordinated material. These assemblages were impossible to distinguish during the excavation. Lithic artefacts and charcoal are much more abundant in the upper part of the deposits, while faunal remains were found throughout all the stratigraphy. The concentration of lithic artifacts visible in the upper assemblage shows a steep slope towards the cave entrance in the north, which is reflected in the strongly oriented distribution pattern of refitted artifacts. Cave bear remains and bones with a brown patina are more abundant in the lower assemblage, whereas anthropic modifications of bones (cut-marks, percussion marks, burnt bones) are concentrated in the upper part of the deposits. A nearly sterile band between upper and lower assemblages is observable in some parts of the site, but not everywhere. The exploration of the spatial distribution of the archaeological materials allowed, in the case of Cassenade, to correctly delimit two distinct assemblages, to characterize them and to discuss the Châtelperronian occupations of the site. All the projections used for this study, originally performed with QGIS (Discamps et al. 2019), can be reproduced using SEAHORS in a much more practical and faster way.

Conclusion
Since the middle of the 20th century, archaeologists have been recording an increasing amount of spatial data during fieldwork, with the intent of preserving contextual information as much as possible. Such tedious work is now indispensable to propose reliable spatial (i.e. planimetrical and stratigraphical) interpretations. As a result, it often leads to the accumulation of several thousands of object coordinates, the analysis of which requires both adequate software and extensive training in GIS software. The aim of the SEAHORS application is to offer tools for the spatial exploration of archaeological objects to a broader audience, through a point-and-click interactive interface, in a free and open-source environment. We hope that this software (in its current form and through future additions) will foster the spatial exploration of archaeological sites and, in particular, the practice of post-excavation stratigraphic analysis (Discamps et al. 2023).