Introduction

ArchaeoPhases provides a list of functions for the statistical analysis of archaeological dates and groups of dates. It is based on the post-processing of the Markov Chains whose stationary distribution is the posterior distribution of a series of dates. Such MCMC output can be simulated by different applications as for instance 'ChronoModel' (see http://www.chronomodel.fr), 'Oxcal' (see https://c14.arch.ox.ac.uk/oxcal.html) or BCal (see http://bcal.shef.ac.uk/). The only requirement is to have a CSV file containing a sample from the posterior distribution.

The Graphical User Interface

For those who already know how to use R, ArchaeoPhases won't be difficult to use. For the others, a Graphical User Interface is available. Click here to launch it.

Installing 'ArchaeoPhases' package

To install ArchaeoPhases you first need to install R and Rstudio that has a nice desktop environment for using R. Once in R (or in RStudio) you can type:

install.packages('ArchaeoPhases')

at the R command prompt to install ArchaeoPhases If you then type:

library(ArchaeoPhases)

it will load in all the ArchaeoPhases functions.

Launching the Graphical User Interface from R on localhost

The web shiny application can be launched by the following code :

app_ArchaeoPhases()

Here is a live demo.

Importing data into R

Data files can be imported into R by the following code :

data_MCMC = ImportCSV()

In order to use the other functions of the package 'ArchaeoPhases', the date format of the MCMC samples has to be in calendar year.

Importing data from 'ChronoModel'

Two different files are generated by ChronoModel : “events.csv” that contains the MCMC samples of each event created in the modelling, and “phases.csv” that contains all the MCMC samples of the minimum and the maximum of each group of dates if at least one group is created. Here is an example of the use of the function ImportCSV() for MCMC generated by ChronoModel.

ChronoModel_MCMC = ImportCSV("pathToFiles/events.csv", iterationColumn = 1)

The parameter iterationColumn will withdraw the iteration column from the dataframe.

ChronoModel_MCMC_Groups = ImportCSV("pathToFiles/phases.csv", iterationColumn = 1)

Importing data from 'Oxcal'

Oxcal generates a CSV file containing the MCMC samples of all parameters (dates, start and end of phases).

Oxcal_MCMC = ImportCSV("pathToFiles/fileName.csv", iterationColumn = 1)

However, the minimum and the maximum of a group of dates can not be extracted from Oxcal. In order to create a dataframe containing these values, use the function CreateMinMaxGroup(). Here is an example of its use:

data("KADatesOxcal")
Oxcal_MCMC_Groups = CreateMinMaxGroup(KADatesOxcal, position = 4, name = "IUP")
Oxcal_MCMC_Groups = CreateMinMaxGroup(KADatesOxcal, position = c(7:13,15:18), name = "Ahmarian", add=Oxcal_MCMC_Groups)
Oxcal_MCMC_Groups = CreateMinMaxGroup(KADatesOxcal, position = c(21:23), name = "UP", add=Oxcal_MCMC_Groups)
Oxcal_MCMC_Groups = CreateMinMaxGroup(KADatesOxcal, position = 26, name = "EPI", add=Oxcal_MCMC_Groups, exportFile = "Oxcal_MCMC_Groups.csv")

Importing data from 'BCal'

BCal generates a CSV file containing the MCMC samples of all parameters (dates, start and end of phases). However, all dates are in format cal BP, that is in year before 1950. Hence, the MCMC have to be converted from the date format cal BP into the calendar year. This can be done by the following lines :

BCal_MCMC = ImportCSV("pathToFiles/fileName.csv", iterationColumn = 1, referenceYear = 1950, rowToWithdraw = lastline)

Note that the last line of the file may contain “NA”. It that case, this line should be withdrawn from the dataset in order to proceed further. Now, again, a file containing the minimum and the maximum values of each group of dates should be created. Let's use the Fishpond dataframe.

data("Fishpond")
BCal_MCMC_Groups = CreateMinMaxGroup(Fishpond, position = c(3:6), name = "Layer.II")
BCal_MCMC_Groups = CreateMinMaxGroup(Fishpond, position = 9, name = "Layer.III", add=BCal_MCMC_Groups, exportFile = "BCal_MCMC_Groups.csv")

Note that using the Fichpond dataset, the MCMC samples are in date format cal BP. See example(ImportCSV()) for the convertion of this dataset.

Convergence of MCMC chains

Let's use the data of Ksar Akil generated by ChronoModel : “KADatesChronoModel” and “KAPhasesChronoModel”. For a more detail on the diagnostic of Markov chain, see Robert and Casella (2009).

To assess the agreement between the posterior distributions and the numerical approximations, three Markov chains were run in parallel by ChronoModel. For each chain, 1 000 iterations were used during the Burn-in period, 20 batches of 500 iterations were used in the Adapt period, 100 000 iterations were drawn in the Acquire period by only 1 out of 10 were kept in order to break the correlation structure.

From the analysis of the history plot, all Markov chains reach their equilibrium before the Acquire period. The autocorrelations of the three Markov chains are not significant, meaning the rate of subsample (1 over 10) is enough.

Now, using the package 'ArchaeoPhases' and the package 'coda', we can verify whether the MCMC samples are correctly generated by the software. Indeed, the MCMC samples should have no autocorrelation and should have reached their equilibrium (that is the posterior density of the parameter under investigation).

data("KADatesChronoModel")
mcmcList = coda.mcmc(KADatesChronoModel, numberChains = 3, iterationColumn = 1)
autocorr.plot(mcmcList[,1,])

plot of chunk unnamed-chunk-12plot of chunk unnamed-chunk-12plot of chunk unnamed-chunk-12 The autocorrelation plots show that each of these three chains are not significant. That means that we actually generated a non correlated sample, which was the aim the MCMC process.

We can also check whether the chains reached equilibrium. For example, let's consider the first date of the dataset.

plot(mcmcList[,1,])

plot of chunk unnamed-chunk-13 The plot shows that the three chains corresponding to the first date reached the same stationnary process.

We can test the Gelman-Rubin criterion. The expected value to confirm that all of the Markov chains reached equilibrium is 1.

gelman.diag(mcmcList)
Potential scale reduction factors:

             Point est. Upper C.I.
Layer.V               1          1
Layer.VI              1          1
Layer.XI              1          1
Layer.XII             1          1
Layer.XVI.4           1          1
Layer.XVI.3           1          1
Layer.XVI.1           1          1
Layer.XVI.2           1          1
Layer.XVII.2          1          1
Layer.XVII.1          1          1
Layer.XVII.3          1          1
Layer.XVII.4          1          1
Layer.XVIII           1          1
Layer.XIX             1          1
Layer.XX              1          1
Layer.XXII            1          1

Multivariate psrf

1
gelman.plot(mcmcList[,1,])

plot of chunk unnamed-chunk-14 The Gelman-Rubin criterion confirms that all of the Markov chains reached equilibrium. We can also test the Geweke criterion. The expected value to confirm that all of the Markov chains reached equilibrium is strickly less than 1.

geweke.diag(mcmcList[,1,], frac1=0.1, frac2=0.5)
[[1]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

   var1 
-0.9288 


[[2]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

  var1 
0.2023 


[[3]]

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

  var1 
-1.583 

The Geweke criterion criterion confirms that all of the Markov chains reached equilibrium. As a conclusion, ChronoModel generated correct samples of the posterior distribution. Now gathering the three chains, a total of 30 000 iterations was collected in order to give estimations of the posterior distribution of each parameter.

Tempo Plot for a series of dates

The tempo plot has been introduced by Thomas S. Dye (Dye, T.S. (2016) Long-term rhythms in the development of Hawaiian social stratification. Journal of Archaeological Science, 71, 1–9). See Philippe and Vibet 2017 for more statistical details.

The tempo plot is one way to measure change over time: it estimates the cumulative occurrence of archaeological events in a Bayesian calibration. The tempo plot yields a graphic where the slope of the plot directly reflects the pace of change: a period of rapid change yields a steep slope and a period of slow change yields a gentle slope. When there is no change, the plot is horizontal. When change is instantaneous, the plot is vertical.

The code is the following (Warning : be patient. The execution time depends on the number of dates included.)

data("KADatesChronoModel")
TempoPlot(KADatesChronoModel, c(2:17), level = 0.95)

plot of chunk unnamed-chunk-16 From these graphs, we can see that the highest part of the sampled activity is dated between -45 000 to -35 000 but two dates are younger, at about -32 000 and -28 000.

Groups of dates

A group of dates is defined by the date of the minimum and the date of the maximum of the group. In this part, we will use the data containing these values for each group of dates.

Time Range Interval

We can estimate the time range of a group of dates as the shortest interval that contains all the dates of the group at a given confidence level (see Philippe and Vibet 2017 for more details). The following code gives the endpoints of the time range of all groups of dates of Ksar Akil data and recall the given confidence level.

data("KAPhasesChronoModel")
MultiPhaseTimeRange(KAPhasesChronoModel, c(8,6,4,2), level = 0.95)
                             Level TimeRangeInf TimeRangeSup
IUP.alpha IUP.beta            0.95       -43217       -41106
Ahmarian.alpha Ahmarian.beta  0.95       -42189       -37461
UP.alpha UP.beta              0.95       -38559       -29335
EPI.alpha EPI.beta            0.95       -29071       -27102

The time range interval of the group of dates is a way to summarise the estimation of its minimum, the estimation of its maximum and their uncertainties at the same time.

Graphical representation

The function PhasePlot() may be used to draw a plot of the characteristics of several groups of dates on a same graph : the marginal posterior density of the minimum and the maximum of each group and its time range at a desired level.

data("KAPhasesChronoModel")
MultiPhasePlot(KAPhasesChronoModel, c(8,6,4,2), level = 0.95)

plot of chunk unnamed-chunk-18

Succession of groups

We may also be interested in a succession of phases. This is actually the case of the succession of IUP, Ahmarian, UP and EPI that are in stratigraphic order. Hence, we can estimate the transition interval and, if it exists, the gap between these successive phases.

Transistions between successive groups

The transition interval between two successive phases is the shortest interval that covers the end of the oldest group of dates and the start of the youngest group of dates. The start and the end are estimated by the minimum and the maximum of the dates included in the group of dates. It gives an idea of the transition period between two successive group of dates. From a computational point of view this is equivalent to the time range calculated between the end of the oldest group of dates and the start of the youngest group of dates. See Philippe and Vibet 2017 for more statistical details.

data("KAPhasesChronoModel")
MultiPhasesTransition(KAPhasesChronoModel, c(8,6,4,2), level = 0.95)
                          0.95 TransitionRangeInf TransitionRangeSup
IUP.beta & Ahmarian.alpha 0.95             -43241             -40728
Ahmarian.beta & UP.alpha  0.95             -39075             -36686
UP.beta & EPI.alpha       0.95             -31504             -26960

For this function, the order of the groups of dates is important. The vector of positions of the minimums should start with the minimum of the oldest phase and end with the one of the youngest phase. For data extracted from ChronoModel or using the function CreateMinMaxGroup(), the vector of positions of the phases’ maximums is deduced from the vector of the minimum. For other data, this vector should be specified.

Gap between successive groups

Successive phases may also be separated in time. Indeed there may exist a gap between them. This testing procedure check whether a gap exists between two successive groups of dates with fixed probability. If a gap exists, it is an interval that covers the end of one group of dates and the start of the successive one with fixed posterior probability. See Philippe and Vibet 2017 for more statistical details.

data("KAPhasesChronoModel")
MultiPhasesGap(KAPhasesChronoModel, c(8,6,4,2), level = 0.95)
                          Level  HiatusIntervalInf HiatusIntervalSup
IUP.beta & Ahmarian.alpha "0.95" "NA"              "NA"             
Ahmarian.beta & UP.alpha  "0.95" "NA"              "NA"             
UP.beta & EPI.alpha       "0.95" "-29180"          "-28977"         

At a confidence level of 95%, there is no gap between the succession of phases IUP, Ahmarian and UP, but there exists one of 203 years between phase UP and phase EPI.

Graphical representation

Now, let's summarise these pieces of information in a plot. The following lines generate the plot of the succession of phases from Ksar Akil.

data("KAPhasesChronoModel")
MultiSuccessionPlot(KAPhasesChronoModel, c(8,6,4,2), level = 0.95)

plot of chunk unnamed-chunk-21 The characteristics of phase IUP are drawn in red, those of phase Ahmarian are in green, those of phase UP are in light blue and those of phase EPI are in purple. As there is only one event in the phases EPI and IUP, the minimum and the maximum of these phases have the same values at each iteration. Hence, we can only see one curve for each of these phases. Time range are displayed by segments above the curves. Two-coloured segments correspond to transition interval or to the gap range between successive phases associated to a level confidence of 95%. As there are no gaps at 95% between phases IUP and Ahmarian, and Ahmarian and UP, a cross is drawn instead.

References

For a description of the statiscal aspects of the functions implemented in ArchaeoPhases version 1.0 :
Anne Philippe, Marie-Anne Vibet. (2017). Analysis of Archaeological Phases using the CRAN Package 'ArchaeoPhases'. HAL, hal-01347895, version 3.

For a use of the tempo plot defined by Dye : Dye, T.S. (2016). Long-term rhythms in the development of Hawaiian social stratification. Journal of Archaeological Science, 71, 1–9

For more details on the diagnostic of Markov chain : Robert and Casella (2009). Introducing Monte Carlo Methods with R. Springer Science & Business Media.

For more details on the Ksar Akil site : Bosch, M. et al. (2015) New chronology for Ksar Akil (Lebanon) supports Levantine route of modern human dispersal into Europe. Proceedings of the National Academy of Sciences, 112, 7683–6.