The network and connectivity of cropland can be used to analyse the potential spread of plant pathogen. While network plays a crucial role, there are several other factors that affects the spread and thus the connectivity. Although croplands may be geographically connected, the risk cannot be generalized as pathogen may not spread if it’s exclusive to specific crop. geohabnet supports up to 10 parameters that has potential to impact risk and connectivity among croplands. The implementation is expanded upon (Xing et al. 2020), which discusses global cropland connectivity. This framework uses default values from the paper at the same time making them as parameters and eventually turning it into framework for the analysis of crops.
Although this article is focused on usage, it is useful to know for interested developers that package design is inspired from widely used Configuration-based design in software development (Majors 2022), (Nash and DeMore 2009), and (Allaire 2023) provides a text based interface to control the parameters values for risk analysis in this context.
Primary objective of this vignette is to help user in getting started, list capabilities and intuition behind them. It also describes underlying implementation at high level to support the intuition behind functions. Throughout the article, we will citing external sites and resources which is relevant to usage of this package.
Raster - Raster is a digital encoding of 2D digital image with underlying pixels as underlying. It also encapsulates other details like resolution, dimensions etc which is useful in identification and processing. Here, we use raster to represent maps.
TIFF(Tag Image File Format) is a file that stores raster and information.
Download and install R, R Studio - https://rstudio-education.github.io/hopr/starting.html/
Rasters - https://www.adobe.com/creativecloud/file-types/image/raster.html
TIFF - https://www.adobe.com/creativecloud/file-types/image/raster/tiff-file.html/
We use publicly available sources to obtain crop information -
Monfreda (Monfreda, Ramankutty, and Foley 2008)
MapSpam (2005) (International Food Policy Research Institute 2019)
Geodata provides set of APIs to access these datasets. For visualization and plots, we use rnaturalearth.
Installing geohabnet
will also install its dependencies.
Please see the list of dependencies using
desc::desc(package = "geohabnet")
if (!require("devtools")) {
install.packages("devtools")
}
library("devtools")
if (!require("geohabnet")) {
install_github("GarrettLab/CroplandConnectivity", subdir = "geohabnet")
}
library(geohabnet)
At any point, access the help page using following -
?geohabnet# and
::sean # any function ?geohabnet
This is to run the analysis on default set of values for the supported parameters. Initially, the values are set based on the Xing et al(2021) (Xing et al. 2020) and crop is Avocado. This would run the workflow on global geographical extent quickly since the crop presence is relatively low.
sensitivity_analysis()
#>
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
========
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
=======
#> [1] TRUE
The results are based on the values set for each of the parameters which we will see in details in the next section.
We provide 2 methods or entry points to run analysis by setting new values for the supported parameters.
The goal of the this function is to provide simple invasive factors
as parameters. The internal implementation and program deals with object
of terra
(Hijmans 2023) and
igraph
(Csardi and Nepusz
2006) . The primary object is of type SpatRaster
from terra
. To get started, we will use quick way to
obtain raster and later we will understand the details.
<- cropharvest_rast("avocado", "monfreda")
avocado
# verify the raster object
avocado#> class : SpatRaster
#> dimensions : 2160, 4320, 1 (nrow, ncol, nlyr)
#> resolution : 0.08333333, 0.08333333 (x, y)
#> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> source : avocado_HarvestedAreaFraction.tif
#> name : avocado_HarvestedAreaFraction
::plot(avocado) terra
Now that we have a raster object, it can be fed directly to the workflow. We will only set the required parameter - rast. The value represents Avocado plantation in geographical area of North America.
<- sean(avocado)
risk_indexes #>
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
=======
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
=======
The results should be interpreted in accordance to the values of
other parameters that have factored as arguments to sean
.
Run ?sean to see all the supported parameters. We will later see the
usage of other parameters as well.
rast - spatRaster. Represents map of crop presence in a geographical area.
geoscale - Vector. Geographical coordinates in the form of c(Xmin, Xmax, Ymin, Ymax)
global - Logical. When set to TRUE, geoscale is ignored.
Get geographical scales used in global analysis -
global_scales()
#> $east
#> [1] -24 180 -58 60
#>
#> $west
#> [1] -140 -34 -58 60
Although recommended not to change the global scales since it has been finalized after several tries. Still, for advance use, set the global geographic scales using -
#set_global_scales(list(east = c(-24, 180, -58, 60), west = c(-140, -34, -58, 60)))
More parameters are available under configuration and thus more
control over the analysis. The configuration file name is
parameters.yaml
, currently supporting up to 10 parameters.
The intuition behind this methodology is to provide a basic interface
for setting new values. The snippet below describes the basic usage of
configuration.
Get the initial configuration file. By default this function will
save the file in temporary directory tempdir()
, however we
recommend saving path where program will have write permissions. Using
iwindow = TRUE will prompt a selection window to save config
file.
<- get_parameters(out_path = tempdir())
config_file #> parameters fetched successfully
config_file#> [1] "/var/folders/r5/zggvft9d3yn5kh51wqp78rd00000gn/T//Rtmpa17YEM/parameters.yaml"
The file should look something like this -
Initial parameters.yaml
The values must be modified without modifying the structure. The order don’t matter for the program. The new values in the configuration must be fed to the workflow using -
set_parameters(new_params = config_file)
#> [1] TRUE
#using iwindow = true will prompt a selction window to choose config file.
get_parameters()
was only to fetch the initial
parameters. While you can, it is not required to re-fetch. Modify the
value and feed it to workflow using set_parameters()
.
We have some helper functions to make the analysis easier. Available sources can be seen using -
::get_supported_sources()
geohabnet#> [1] "monfreda" "mapspam"
More specifically, we support 2 data set sources - Monfreda and Mapspam using the APIs provided by Geodata. These sources have crop presence in the geographical areas in the form of raster(tiff). To start with, we can check if a certain crop is available in the source -
search_crop("avocado") #1
#> [1] "monfreda"
search_crop("banana") #2
#> [1] "monfreda" "spam"
The #1 result suggests that we have crop information available under
monfreda
. Now, #2 suggests that banana is present
in 2 sources. We will use the results from here to obtain its raster
data from the one of the specified sources.
# get avocado data
<- crops_rast(list(monfreda = "avocado"))
rast_avocado
#get data of banana crop
<- crops_rast(list(mapspam = "banana")) rast_ban
The returned is actually if type SpatRaster
. In above,
we will have seperate raster information for each of the crop. Crops can
be combined and be returned as one raster.
<- crops_rast(list(monfreda = c("avocado", "banana"), mapspam = c("banana"))) rast_avo_ban
In this case for banana, information from all the sources
will be first combined using mean and then all the crop information
i.e. banana and avocado both are again combined into one raster. This is
done to produce a single map which contains information of all the
requested crops. We have one more method to obtain raster, using
tiff - tiff_torast()
. Finally, we have
get_rasters()
method, which is an abstraction of . To see
the difference in data that was generated in last 3 methods -
<- lapply(list(rast_avocado, rast_ban, rast_avo_ban), terra::rast) results
The result obtained from one of these methods can be used as an
argument to sean()
. Value of results
is list
of risk indices.
<- lapply(list(rast_avocado, rast_ban, rast_avo_ban), sean) results
So far, we have now run the sensitivity analysis workflow for avocado
and banana on global scale and default set of parameter values.
Alternatively, we can just specify the crops under
parameters.yaml
and run the workflow -
<- sensitivity_analysis()
results #>
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
=======
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
======
Thresholds are used to select subset of values from the
SpatRaster
on which the operations are applied. It directly
effects the connectivity and gives a sense of sensitivity in the
network. The intermediate goal is to produce a adjacency graph which
essentially determines the connectivity. Cells which doesn’t meet the
threshold are removed from the consideration by dispersal models.
<- sean(avocado, host_density_threshold = 0.00001, link_threshold = 0.00001)
risk_indexes #>
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
======
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
======
host density threshold. The host density threshold is the minimum
cropland proportion in the grid cells (or locations) that will be
included in the analysis. This parameter is called
HostDensityThreshold
and supports a list of values between
0 and 1. Before running the sensitivity_analysis() function, check that
the values for the host density threshold are smaller than the maximum
host density in the map to prevent errors. The values are rounded off to
5 decimal points.
Based on the information on host distribution and dispersal kernels, adjacency matrices are created, where entries are the likelihood of pathogen movement between locations. Then, adjacency matrices are converted into graph objects to perform a network analysis, where the entries in the adjacency matrices now are the weight of the links of the network.
Choosing link weight thresholds helps to focus the analysis on the more likely pathogen dispersal in the landscape.
Like what you did with the host density threshold, you can provide a
list of positive values to LinkThreshold
. Before running
the sensitivity_analysis() function, check that the values for the link
weight threshold are smaller than the maximum link weight in the network
to prevent errors.
Aggregation strategy refers to the function used to create a new map of host density with a lower resolution (larger cells). Reducing the spatial resolution helps to reduce the computational power needed to run the analysis.
If AggregationStrategy: [sum], then the sum of the cropland proportion of all initially small grids within a large grid is divided by the total number of initially small grids within that large grid.
If AggregationStrategy: [mean], then the sum of the cropland proportion of all initially small grids within a large grid is divided by the total number of initially small grids containing only land (where small grids with water are excluded) within that large grid.
By default, analysis is run on both but can be opted out from one. If only one method is used, then the difference map is skipped from the outcome.
For each pair of locations in the host map with values greater than the host density threshold, the sensitivity_analysis() function will calculate the physical distances and use them to calculate the relative likelihood of pathogen movement between locations based on their pairwise geographical proximity.
There are two different options to calculate the distance between locations.
· Vincenty ellipsoid distance
This option is highly accurate but more computationally expensive.
· Geodesic distance
This option is less computationally expensive and less accurate than the option above.
You can set the distance option either as DistanceStrategy:
“vincentyEllipsoid” or DistanceStrategy: “geodesic”. One
of these options should be used as input to run the analysis. Check for
supported methods in analysis by running dist_methods()
in
the console.
dist_methods()
#> [1] "geodesic" "vincentyellipsoid"
The aggregation factor or granularity is the number of small grid
cells that are aggregated into larger grid cells in each direction
(horizontally and vertically). The finest value is 1 which can require
analysis to run up to hours because of large number of cells in
SpatRaster
. The resolution is also used in calculation of
variance while dis-aggregating the risk indices into coarser resolution
for producing maps.
If not provided, the defaulted value is selected from
reso()
See available metrics using
supported_metrics()
#> [1] "betweeness" "node_strength"
#> [3] "sum_of_nearest_neighbors" "eigenvector_centrality"
#> [5] "closeness" "degree"
#> [7] "page_rank"
Metrics corresponding to dispersal models are applied to distance matrix with specified weights. The weights must be specified in % and sum of all the weights should be equal to 100. We use functions from (Csardi and Nepusz 2006) to calculate metrics for each dispersal model. The 2 dispersal models that are applied to parameters inverse power law and negative exponential. More formally, metrics are way to determine connectivity among nodes in a network.
Geographical extent is a subset of world map defined by coordinate
reference system. The corresponding parameter to set the area in
sean()
and sensitivity_analyis()
is
geoscale
and GeoExtent
respectively. Default
setting is global = TRUE
which will ignore the value of
geoscale
. This will consider take the world map into
account using values from global_scales()
.
Using function
<- sean(avocado, global = FALSE, geoscale = c(-115, -75, 5, 32)) results
Using config
Set Global = FALSE
and
CustomExt = [-115, -75, 5, 32]
. The initial
parameters.yaml
already contains this value which would run
in combination with other parameters.
When provided with geoscale
, program will take the
subset of provided raster (dataset of crop). The workflow will apply
graph operations and network connectivity only to the subset.
By default, 3 maps are produced for each analysis.
sean()
also returns list of risk indexes which are then fed
to connectivity()
. This flexibility is supposed to allow
users to use the risk indexes for their purposes or use our function to
produce maps with further different parameters.
In a code below, after obtaining results, the maps are produced. In
order to produce calculate variance, cells of SpatRaster
is
extended to coarser value using res
parameter. Setting
maps = FALSE
will suppress the calculation of outputs.
<- sean(avocado)
risk_indexes #>
|---------|---------|---------|---------|
====
|---------|---------|---------|---------|
=======
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
=======
connectivity(risk_indexes)
#>
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
=======
|---------|---------|---------|---------|
===
|---------|---------|---------|---------|
=======
is equivalent to sensitivity_analysis()
.
The final operations are performed risk indices and on the 3 results that are produced -
Mean
Simply a mean of all the SpatRasters
resulting from
combination of parameter values. The values in cells are added across
all the indices and divided by number of indices.
Variance
Uses stats::var
on risk indices, subset is extracted for
provided scale and finally pixels are dis-aggregated using factor =
resolution value in original parameter to from previous
step.
Difference
If both the aggregation methods (sum and mean) is selected, then difference is calculated between the rank of matrices which are essentially numeric cells of risk indices of type SpatRaster. The result is dis-aggregated in the same way as previous step.
Set pmean, pvar, pdiff to FALSE to skip the any of
the calculation in connectivity()
to skip this calculation.
In sean()
or sensitivity_analysis()
, set
map = FALSE
to skip the generation of maps as an outcome.
In case of global analysis, result of eastern and western geographic
extents are merged using terra::merge()
. The outcome of
each of the operation are saved in the new directory plots
under the specified path in OutDir
with name
opt_datetime.tif, where opt is one of the above
suffixed by datetime of the file created. If the outdir
is
empty, the value is defaulted to tempdir()
. This applies
to correponding parameter outdir
in all the functions.