litteR
is a modular tool for analyzing litter data (e.g., beach litter). The current version (0.4.1) contains the following modules:
One can optionally switch modules on or off. These modules run independently from each other.
This user guide consists of two parts. In the first part, the user interface is described, the second part gives more details on the modules.
litteR
can be loaded by means of library(litteR)
.
For applications with litteR
see Schulz et al. (2019).
The easiest way to start working with litteR
is to create an empty project directory. This directory can be filled with example and reference files by running:
create_litter_project("d:/work/litter-projects/beach-litter")
where its argument (the quoted part in parentheses) is an existing work directory on your computer. This can be any valid directory name with sufficient user privileges. Note for MS-Windows users: R requires forward slashes!
It is also possible to run create_litter_project()
without an argument. In that case, a simple graphical user interface pops up for interactive directory selection.
litteR
can be started typing litter()
in the RStudio console (see the figure below).
Functions to start a litteR session.
After entering litter()
a simple graphical user interface pops up for file selection. An example of a file selection dialogue, in this case for selecting the input file, is given below.
File open dialogue.
The file dialogues of litteR
ask the user to specify:
These files are briefly described below.
The settings file contains all settings needed to run litteR
. The settings file is in the YAML-format. This is a human-readable data language that is commonly used for settings files. An example of the contents of a settings file is given in the figure below.
### BASIC SETTINGS ###
# Name of analyst
analyst_name: "RWS"
# Which modules to run (false or true)
module_stats: true
module_assessment: true
module_trend: true
module_baseline: true
module_power: true
# Period to analyse (YYYY-mm-dd)
min_date: 2012-01-01
max_date: 2017-12-31
# Percentage of total abundance to analyse (0 < percentage_total_abundance <= 100)
percentage_total_abundance: 80
# Litter type to analyse
# (e.g., OSPAR codes in square brackets
# and [TA] for total abundance)
litter_type: [[TA], [49]]
# Threshold value analysis
# Assessment type: mean or median
assessment_statistic: median
# Threshold group or type
threshold_type: [TA]
# Threshold value: natural number
threshold_value: 15
# Image quality: high or low
image_quality: high
### ADVANCED SETTINGS ###
# Power-analysis: number of Monte Carlo simulations
# Note that larger values lead to longer run times.
# The default number of simulations is 100 to speed up computation.
# However, 1000 simulations generally give more accurate results
number_of_simulations: 100
# Power-analysis: significance level
alpha: 0.05
# Power-analysis: resolution of effect size (range: 5% ... 50%)
resolution_effect_size: 10
# Power-analysis: minimum number of surveys to sample from
min_surveys: 16
# Show source code? (true or false)
show_source_code: false
The YAML-file contains the following entries:
entry | description | value |
---|---|---|
analyst_name | name of the person who performs the litter analysis | text |
module_stats | Activate the descriptive statistics module? | true or false |
module_assessment | Activate the threshold assessment module? | true or false |
module_trend | Activate the trend analysis module? | true or false |
module_baseline | Activate the baseline analysis module? | true or false |
module_power | Activate the power analysis module? | true or false |
min_date | first date to analyse | YYYY-mm-dd (ISO 6801) |
max_date | last date to analyse | YYYY-mm-dd (ISO 6801) |
percentage_total_abundance | percentage of total abundance to analyse | percentage, default value: 80% |
litter_type | litter type to analyse | litter/group code in square brackets, e.g., [[49], [TA], [SUP]] |
assessment_statistic | statistic to use in the assessment | either median or mean |
threshold_type | threshold group or threshold type | [TA] etc. |
threshold_value | threshold value | 15 |
image_quality: high | quality of the images | high or low |
number_of_simulations | number of Monte Carlo simulations for power analysis | integer greater than 0. Default value: 100 |
alpha | significance level used for assessment and power-analysis | numeric in 0..1. Default value: 0.05 |
resolution_effect_size | resolution of the effectsize (power analysis) | range: 5% .. 50% |
min_surveys | minimum number of surveys to sample from in power analysis | integer greater than 0. Default value: 16 |
show_source_code | Show all R source code? | true or false |
The current version of litteR
reads litter data in three formats:
These formats will be briefly described below.
The OSPAR format is a wide format, meaning that all litter types are stored in columns and each row represents a survey. OSPAR beach litter data can be downloaded from the OSPAR website.
The image below gives an example of the first 10 columns and records of litter data in the OSPAR-format.
# A tibble: 10 x 10
RefNo `Beach name` Country Region `Survey date`
<chr> <chr> <chr> <chr> <chr>
1 NL001 Bergen Netherlands 3. Southern North Sea 27/01/2012
2 NL001 Bergen Netherlands 3. Southern North Sea 20/04/2012
3 NL001 Bergen Netherlands 3. Southern North Sea 22/07/2012
4 NL001 Bergen Netherlands 3. Southern North Sea 19/10/2012
5 NL001 Bergen Netherlands 3. Southern North Sea 19/02/2013
6 NL001 Bergen Netherlands 3. Southern North Sea 11/04/2013
7 NL001 Bergen Netherlands 3. Southern North Sea 20/07/2013
8 NL001 Bergen Netherlands 3. Southern North Sea 16/10/2013
9 NL001 Bergen Netherlands 3. Southern North Sea 08/01/2014
10 NL001 Bergen Netherlands 3. Southern North Sea 23/04/2014
Period `Plastic: Yokes [1]` `Plastic: Bags [2]`
<dbl> <dbl> <dbl>
1 1 0 3
2 2 0 8
3 3 0 1
4 4 0 2
5 -1 0 24
6 2 0 0
7 3 0 10
8 4 0 7
9 1 0 9
10 2 0 10
`Plastic: Small_bags [3]` `Plastic: Bag_ends [112]`
<dbl> <dbl>
1 9 0
2 12 0
3 5 0
4 4 0
5 23 13
6 9 1
7 4 0
8 5 1
9 20 0
10 29 0
The columns are separated by comma’s (CSV-file). Five columns are compulsory, i.e., “refno”, “beach name”, “country”, “region”, and “survey date”. Note that the date format does not comply with ISO 6801 standard date format. Instead, OSPAR uses dd/mm/YYYY (see the image above). The other columns contain litter types. The names of these columns have the following format
litter group: litter type [litter code]
for instance, ‘Plastic: Bags [2]’.
Optionally, other columns may be added as metadata. However, these columns will be ignored by litteR
.
The wide format is comparable to the OSPAR format, but less restrictive. The following columns are required: “region_name”,“country_code”,“country_name”,“location_name”,and “date”. The columns are separated by comma’s (CSV-file)
The image below gives an example of the wide format.
# A tibble: 10 x 8
region_name country_code country_name location_code location_name
<chr> <chr> <chr> <chr> <chr>
1 OSPAR NL Netherlands NL001 Bergen
2 OSPAR NL Netherlands NL001 Bergen
3 OSPAR NL Netherlands NL001 Bergen
4 OSPAR NL Netherlands NL001 Bergen
5 OSPAR NL Netherlands NL001 Bergen
6 OSPAR NL Netherlands NL001 Bergen
7 OSPAR NL Netherlands NL001 Bergen
8 OSPAR NL Netherlands NL001 Bergen
9 OSPAR NL Netherlands NL001 Bergen
10 OSPAR NL Netherlands NL001 Bergen
date `Plastic: Yokes [1]` `Plastic: Bags [2]`
<date> <dbl> <dbl>
1 2012-01-27 0 3
2 2012-04-20 0 8
3 2012-07-22 0 1
4 2012-10-19 0 2
5 2013-02-19 0 24
6 2013-04-11 0 0
7 2013-07-20 0 10
8 2013-10-16 0 7
9 2014-01-08 0 9
10 2014-04-23 0 10
It’s less restrictive than the OSPAR-format in the sense that litter types are not restricted to the format
litter group : litter type [litter code]
The only requirement is that a [litter code] should be available. Indeed, all litter specifications given below are valid:
The first three specifications correspond to the OSPAR-code, the TSG-ML general code (Technical Subgroup on Marine Litter), and the UNEP-code respectively.
The long format is convenient for data analysis. The following columns are required: “region_name”, “country_code”, “country_name”, “location_name”, “date”, “type_name”, and “abundance”. The columns are separated by comma’s (CSV-file)
The image below gives an example of the long format. It supports the same litter coding as the wide format.
# A tibble: 10 x 8
region_name country_code country_name location_code location_name
<chr> <chr> <chr> <chr> <chr>
1 OSPAR NL Netherlands NL001 Bergen
2 OSPAR NL Netherlands NL001 Bergen
3 OSPAR NL Netherlands NL001 Bergen
4 OSPAR NL Netherlands NL001 Bergen
5 OSPAR NL Netherlands NL001 Bergen
6 OSPAR NL Netherlands NL001 Bergen
7 OSPAR NL Netherlands NL001 Bergen
8 OSPAR NL Netherlands NL001 Bergen
9 OSPAR NL Netherlands NL001 Bergen
10 OSPAR NL Netherlands NL001 Bergen
date type_name abundance
<date> <chr> <dbl>
1 2012-01-27 Plastic: Yokes [1] 0
2 2012-04-20 Plastic: Yokes [1] 0
3 2012-07-22 Plastic: Yokes [1] 0
4 2012-10-19 Plastic: Yokes [1] 0
5 2013-02-19 Plastic: Yokes [1] 0
6 2013-04-11 Plastic: Yokes [1] 0
7 2013-07-20 Plastic: Yokes [1] 0
8 2013-10-16 Plastic: Yokes [1] 0
9 2014-01-08 Plastic: Yokes [1] 0
10 2014-04-23 Plastic: Yokes [1] 0
All input files are validated by litteR
. The following validation rules apply:
The work directory should also contain a file called ‘litter-groups.csv’. This file assigns each litter type (type_name
, in rows) to one or more litter groups (columns). This file is automatically generated when using the create_litter_project
function, described earlier in this tutorial. The first 11 rows of this file are given below.
First 10 records of the litter-groups.csv file.
Both individual type codes and litter groups (column names) can be specified as litter_type
in the settings-file (*.yaml). For instance:
litter_type: [[TA], [49], [SUP], [FISH]]
The user may optionally add new groups to or remove existing groups from this table. Only the type_name
and TA
-columns (total abundance) are compulsory.
litteR
produces an HTML report that can best be viewed with modern web browsers like Mozilla FireFox, Google Chrome, or Safari. These browsers are freely available from the internet.
The filename of each report starts with ‘litter-report’, followed by
For example: litter-report-[TA][49]-STABP-20190521-074547.html
In the remainder of this section, each section of the HTML-report is briefly described.
This section gives a summary of the settings/parameters in the settings file.
In this section (potential) problems in the input files are reported.
For each selected litter type and period, this section gives several descriptive statistics. These statistics provide useful information about the data in a concise way. The following statistics are given:
These statistics will be estimated for the top x% types, i.e. types with the greatest abundances making up x% of the total abundance for each location.
For each location, it is tested if the distribution of abundances is significantly lower than the supplied threshold value. Testing is performed by means of the one-tailed Wilcoxon signed rank test. In addition, the percentage of locations, with significantly low litter abundances at several spatial scales is given.
This section gives trend analysis results. The figures show time-series of litter items for each location, together with a monotonic trend line based on the Theil-Sen slope estimator. The Theil-Sen slope estimator is usually more robust than slopes estimated by ordinary least squares regression. In addition, a loess-smoother is given to reveal potential non-linearities in the trend.
Finally, a table is provided showing the magnitude of the Theil-Sen slope estimator and its corresponding p-value.
Example of a trend plot for total abundance (TA) at a beach near Bergen (The Netherlands). In this plot, the black dots are the observations, the thin gray line segments connect the dots and guide the eye, the blue line is a loess-smoother, and the red line is the Theil-Sen slope.
The aim of baseline analysis is to identify the minimum number of surveys needed to obtain stable baseline estimates.
This section provides figures showing the moving average as function of window size, i.e. the number of consecutive years, for each location.
The following procedure was followed to produce these plots:
Example of a baseline plot. Each dot is the average abundance of a specific litter type or the total abundance (TA) within a moving window of the size given on the x-axis.
In addition, also a table is presented giving for each location and number of years (# years) the mean, the standard deviation (sd), the coefficient of variation (CV), the median, the median absolute deviation (MAD), and the ratio of MAD to median of the baseline statistics (mean and median) plotted above.
Snapshot of the baseline table in the report. For an explanation, see main text.
In this section, the power of the Wilcoxon signed rank test is estimated. The null hypothesis of this test is
H0: distribution of litter data is symmetric about the baseline value
and the alternative hypothesis is
H1: distribution of litter data is less than the baseline value
Hence, this is a test for a step trend. The power of a hypothesis test is the probability that the test correctly rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true.
The power is useful to check if the number of surveys is sufficient. If the power is too low, sampling effort should be increased to be able to correctly detect trends. On the other hand, if the power is too high, sampling effort can be reduced. In both cases, power analysis may lead to more efficient allocation of financial resources.
In litteR
, power analysis is carried out by means of Monte Carlo simulation for different values of the reduction (effect size), sample size and statistical significance. The procedure is as follows:
For each location, the time-series of the selected litter types are selected. For each of these time-series:
The reduction factor f scales the monitoring data. The following expression holds:
mean(simulated data) \(\approx\) f \(\times\) mean(monitoring data) = f \(\times\) (baseline value)
Note that f = 1 means no reduction (mean of the simulated data is approximately equal to the baseline value), and f = 0 means absence of litter (for instance, a pristine clean beach).
Example of a power analysis plot. It gives the power (y-axis) as function of the number of surveys (x-axis) for different effect sizes (see legend).
In addition to a report, a CSV-file with summary statistics will be produced for each location. This file is accompanied by a file with metadata. This file is given below:
column_name | description | unit |
---|---|---|
region_name | administrative unit, e.g., OSPAR or Southern North Sea | 1 |
country_code | two-letter upper case country code according to ISO 3166-1 alpha-2 | 1 |
location_name | name of the survey location | 1 |
type_name | name of the litter type | 1 |
type_code | code of the litter type | 1 |
from | first date of the survey | date |
to | final date of the survey | date |
mean | mean abundance | count |
median | median abundance | count |
cv | coefficient of variation of the abundance | 1 |
rmad | ratio of MAD to median | 1 |
n | number of surveys used to estimate these statistics | 1 |
slope | slope of the Theil-Sen trend (annual increase in abundance) | 1/a |
p_value_slope | p-value of the Theil-Sen slope | 1 |
min | minimum abundance | count |
p01 | 1st percentile of the abundance | count |
p05 | 5th percentile of the abundance | count |
p10 | 10th percentile of the abundance | count |
p25 | 25th percentile of the abundance (first quartile) | count |
p50 | 50th percentile of the abundance (second quartile or median) | count |
p75 | 75th percentile of the abundance (third quartile) | count |
p90 | 90th percentile of the abundance | count |
p95 | 95th percentile of the abundance | count |
p99 | 99th percentile of the abundance | count |
max | maximum abundance | count |
Schulz, Marcus, Dennis J.J. Walvoort, Jon Barry, David M. Fleet, Willem M.G.M. van Loon, 2019. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. Environmental Pollution 248:555-564. https://doi.org/10.1016/j.envpol.2019.02.030