dodgr
is an R package for calculating Distances On Directed Graphs. It does so very efficiently, and is able to process larger graphs than many other comparable R packages. Skip straight to the Intro if you know what directed graphs are (but maybe make a brief stop-in to Dual-Weighted Directed Graphs below.) Directed graphs are ones in which the “distance” (or some equivalent measure) from A to B is not necessarily equal to that from B to A. In Fig. 1, for example, the weights between the graph vertices (A, B, C, and D) differ depending on the direction of travel, and it is only possible to traverse the entire graph in an anti-clockwise direction.
Figure 1: A weighted, directed graph
Graphs in dodgr
are represented by simple flat data.frame
objects, so the graph of Fig. 1, presuming the edge weights to take values of 1, 2, and 3, would be,
## from to d
## 1 A B 1
## 2 B A 2
## 3 B C 1
## 4 B D 3
## 5 C B 2
## 6 C D 1
## 7 D C 2
## 8 D A 1
The primary function of dodgr
is dodgr_dists
, which calculates pair-wise shortest distances between all vertices of a graph.
dodgr_dists (graph)
## A B C D
## A 0 1 2 3
## B 2 0 1 2
## C 2 2 0 1
## D 1 2 2 0
dodgr_dists (graph, from = c ("A", "C"), to = c ("B", "C", "D"))
## B C D
## A 1 2 3
## C 2 0 1
Shortest-path distances on weighted graphs can be calculated using a number of other R packages, such as igraph
or e1071
. dodgr
comes into its own through its ability to trace paths through dual-weighted directed graphs, illustrated in Fig. 2.
Figure 2: A dual-weighted, directed graph in which the weights (represented by thicknesses) of the grey arrows differ from those of the black arrows.
Dual-weighted directed graphs are common in many areas, a foremost example being routing through street networks. Routes through street networks depends on mode of transport: the route a pedestrian might take will generally differ markedly from the route the same person might take if behind the wheel of an automobile. Routing through street networks thus generally requires each edge to be specified with two weights or distances: one quantifying the physical distance, and a second weighted version reflecting the mode of transport (or some other preferential weighting).
dodgr
calculates shortest paths using one set of weights (called “weights” or anything else starting with “w”), but returns the actual lengths of them using a second set of weights (called “distances”, or anything else starting with “d”). If no weights are specified, distances alone are used both for routing and final distance calculations. Consider that the weights and distances of Fig. 2 are the black and grey lines, respectively, with the latter all equal to one. In this case, the graph and associated shortest distances are,
## from to w d
## 1 A B 1 1
## 2 B A 2 1
## 3 B C 1 1
## 4 B D 3 1
## 5 C B 2 1
## 6 C D 1 1
## 7 D C 2 1
## 8 D A 1 1
## A B C D
## A 0 1 2 2
## B 1 0 1 1
## C 2 1 0 1
## D 1 2 1 0
Note that even though the shortest “distance” from A to D is actually A\(\to\)B\(\to\)D with a distance of only 2, that path has a weighted distance of 1 + 3 = 4. The shortest weighted path is A\(\to\)B\(\to\)C\(\to\)D, with a distance both weighted and unweighted of 1 + 1 + 1 = 3. Thus d(A,D) = 3
and not 2.
dodgr
Although the package has been intentionally developed to be adaptable to any kinds of networks, most of the applications illustrated here concern street networks, and also illustrate several helper functions the package offers for working with street networks. The basic graph
object of dodgr
is nevertheless arbitrary, and need only minimally contain three or four columns as demonstrated in the simple examples at the outset.
The package may be used to calculate a matrix of distances between a given set of geographic coordinates. We can start by simply generating some random coordinates, in this case within the bounding box defining the city of York in the U.K.
bb <- osmdata::getbb ("york uk")
npts <- 1000
xy <- apply (bb, 1, function (i) min (i) + runif (npts) * diff (i))
bb; head (xy)
## min max
## x -1.241536 -0.9215361
## y 53.799056 54.1190555
## x y
## [1,] -1.1713502 53.89409
## [2,] -1.2216108 54.01065
## [3,] -1.0457199 53.83613
## [4,] -0.9384666 53.93545
## [5,] -0.9445541 53.89436
## [6,] -1.1207099 54.01262
Those points can simply be passed to dodgr_dists()
:
system.time(
d <- dodgr_dists (from = xy, to = xy,
wt_profile = "foot", quiet = FALSE)
)
## No graph submitted to dodgr_dists; downloading street network ... done
## Converting network to dodgr graph ... done
## Calculating shortest paths ... done
## user system elapsed
## 26.620 0.132 28.567
dim (d); range (d, na.rm = TRUE)
## [1] 1000 1000
## [1] 0.00000 46.60609
The result is a matrix of 1000-by-1000 distances of up to 47km long, measured along routes weighted for optimal pedestrian travel. In this case, the single call to dodgr_dists()
automatically downloaded the entire street network of York and calculated one million shortest-path distances, all in under 30 seconds.
Although the above code is short and fast, most users will probably want more control over their graphs and routing possibilities. Each of the steps indicated above (through the quiet = FALSE
option) can be implemented separately. To illustrate, the remainder of this vignette analyses the much smaller street network of Hampi, Karnataka, India, included in the dodgr
package as the dataset hampi
. This data set may be re-created with the following single line:
hampi <- dodgr_streetnet ("hampi india")
Or with the equivalent version bundled with the package:
class (hampi)
## [1] "sf" "data.frame"
class (hampi$geometry)
## [1] "sfc_LINESTRING" "sfc"
dim (hampi)
## [1] 191 15
The streetnet
is an sf
(Simple Features) object containing 189 LINESTRING
geometries. In other words, it’s got an sf
representation of 189 street segments. The R package osmplotr
can be used to visualise this street network (with the help of magrittr
pipe operator, %>%
):
library (osmplotr)
library (magrittr)
map <- osm_basemap (hampi, bg = "gray95") %>%
add_osm_objects (hampi, col = "gray5") %>%
add_axes () %>%
print_osm_map ()
(Note that this requires current dev
version of osmplotr
, which can be installed with devtools::install_github ("ropensci/osmplotr")
.)
The sf
class data representing the street network of Hampi can then be converted into a flat data.frame
object by
graph <- weight_streetnet (hampi, wt_profile = "foot")
dim (graph)
## [1] 5845 15
head (graph)
## geom_num edge_id from_id from_lon from_lat to_id to_lon
## 1 1 1 339318500 76.47489 15.34169 339318502 76.47612
## 2 1 2 339318502 76.47612 15.34173 339318500 76.47489
## 3 1 3 339318502 76.47612 15.34173 2398958028 76.47621
## 4 1 4 2398958028 76.47621 15.34174 339318502 76.47612
## 5 1 5 2398958028 76.47621 15.34174 1427116077 76.47628
## 6 1 6 1427116077 76.47628 15.34179 2398958028 76.47621
## to_lat d d_weighted highway way_id component time
## 1 15.34173 132.442169 147.15797 service 28565950 1 95.358362
## 2 15.34169 132.442169 147.15797 service 28565950 1 95.358362
## 3 15.34174 8.888670 9.87630 service 28565950 1 6.399842
## 4 15.34173 8.888670 9.87630 service 28565950 1 6.399842
## 5 15.34179 9.326536 10.36282 service 28565950 1 6.715106
## 6 15.34174 9.326536 10.36282 service 28565950 1 6.715106
## time_weighted
## 1 105.953735
## 2 105.953735
## 3 7.110936
## 4 7.110936
## 5 7.461229
## 6 7.461229
Note that the actual graph contains around 30 times as many edges as there are streets, indicating that each street is composed on average of around 30 individual segments. The individual points points or vertices from those segments can be extracted with,
vt <- dodgr_vertices (graph)
head(vt)
## id x y component n
## 1 339318500 76.47489 15.34169 1 0
## 2 339318502 76.47612 15.34173 1 1
## 4 2398958028 76.47621 15.34174 1 2
## 6 1427116077 76.47628 15.34179 1 3
## 8 339318503 76.47641 15.34190 1 4
## 10 2398958034 76.47650 15.34199 1 5
dim (vt)
## [1] 2861 5
From which we see that the OpenStreetMap representation of the streets of Hampi has 189 line segments with 2,987 unique points and 6,096 edges between those points. The number of edges per vertex in the entire network is thus,
nrow (graph) / nrow (vt)
## [1] 2.042992
A simple straight line has two edges between all intermediate nodes, and this thus indicates that the network in it’s entirety is quite simple. The data.frame
resulting from weight_streetnet()
is what dodgr
uses to calculate shortest path routes, as will be described below, following a brief description of weighting street networks.
The foregoing graph
object returned from weight_streetnet()
also includes a $component
column enumerating all of the distinct inter-connected components of the graph.
table (graph$component)
##
## 1 2 3 4 5 6
## 3653 2160 8 8 8 8
Components are numbered in order of decreasing size, with $component = 1
always denoting the largest component. In this case, that component contains 3,934 edges, representing 65% of the graph. There are clearly only three distinct components, but this number may be much larger for larger graphs, and may be obtained from,
length (unique (graph$component))
## [1] 6
Component numbers can be determined for any types of graph with the dodgr_components()
function. For example, the following lines reduce the previous graph to a minimal (non-spatial) structure of four columns, and then (re-)calculate a fifth column of $component
s:
cols <- c ("edge_id", "from_id", "to_id", "d")
graph_min <- graph [, which (names (graph) %in% cols)]
graph_min <- dodgr_components (graph_min)
head (graph_min)
## edge_id from_id to_id d component
## 1 1 339318500 339318502 132.442169 1
## 2 2 339318502 339318500 132.442169 1
## 3 3 339318502 2398958028 8.888670 1
## 4 4 2398958028 339318502 8.888670 1
## 5 5 2398958028 1427116077 9.326536 1
## 6 6 1427116077 2398958028 9.326536 1
The component
column column can be used to select or filter any component in a graph. It is particularly useful to ensure routing calculations consider only connected vertices through simply removing all minor components:
graph_connected <- graph [graph$component == 1, ]
This is explored further below (under Distance Matrices).
Dual-weights for street networks are generally obtained by multiplying the distance of each segment by a weighting factor reflecting the type of highway. As demonstrated above, this can be done easily within dodgr
with the weight_streetnet()
function, which applies the named weighting profiles included with the dodgr
package to OpenStreetMap networks extracted with the osmdata
package.
This function uses the internal data dodgr::weighting_profiles
, which is a list of three items:
weighting_profiles
;surface_speeds
; andpenalties
Most of these data are used to calculate routing times with the dodgr_times
function, as detailed in an additional vignette. The only aspects relevant for distances are the profiles themselves, which assign preferential weights to each distinct type of highway.
wp <- weighting_profiles$weighting_profiles
names (wp)
## [1] "name" "way" "value" "max_speed"
class (wp)
## [1] "data.frame"
unique (wp$name)
## [1] "foot" "horse" "wheelchair" "bicycle" "moped"
## [6] "motorcycle" "motorcar" "goods" "hgv" "psv"
wp [wp$name == "foot", ]
## name way value max_speed
## 1 foot motorway 0.00 NA
## 2 foot trunk 0.40 NA
## 3 foot primary 0.50 5
## 4 foot secondary 0.60 5
## 5 foot tertiary 0.70 5
## 6 foot unclassified 0.80 5
## 7 foot residential 0.90 5
## 8 foot service 0.90 5
## 9 foot track 0.95 5
## 10 foot cycleway 0.95 5
## 11 foot path 1.00 5
## 12 foot steps 0.80 2
## 13 foot ferry 0.20 5
## 14 foot living_street 0.95 5
## 15 foot bridleway 1.00 5
## 16 foot footway 1.00 5
## 17 foot motorway_link 0.00 NA
## 18 foot trunk_link 0.40 NA
## 19 foot primary_link 0.50 5
## 20 foot secondary_link 0.60 5
## 21 foot tertiary_link 0.70 5
Each profile is defined by a series of percentage weights quantifying highway-type preferences for a particular mode of travel. The distinct types of highways within the Hampi graph obtained above can be tabulated with:
table (graph$highway)
##
## living_street path primary residential service
## 116 2843 1194 94 168
## steps tertiary track unclassified
## 128 284 704 314
Hampi is unlike most other human settlements on the planet in being a Unesco World Heritage area in which automobiles are generally prohibited. Accordingly, numbers of "footway"
, "path"
, and "pedestrian"
ways far exceed typical categories denoting automobile traffic ("primary", "residential", "tertiary"
)
It is also possible to use other types of (non-OpenStreetMap) street networks, an example of which is the os_roads_bristol
data provided with the package. “OS” is the U.K. Ordnance Survey, and these data are provided as a Simple Features (sf
) data.frame
with a decidedly different structure to osmdata data.frame
objects:
names (hampi) # many fields manually removed to reduce size of this object
## [1] "osm_id" "bicycle" "covered" "foot"
## [5] "highway" "incline" "motorcar" "motorcycle"
## [9] "motor_vehicle" "oneway" "surface" "tracktype"
## [13] "tunnel" "width" "geometry"
names (os_roads_bristol)
## [1] "fictitious" "identifier" "class" "roadNumber" "name1"
## [6] "name1_lang" "name2" "name2_lang" "formOfWay" "length"
## [11] "primary" "trunkRoad" "loop" "startNode" "endNode"
## [16] "structure" "nameTOID" "numberTOID" "function." "geometry"
The latter may be converted to a dodgr
network by first specifying a weighting profile, here based on the formOfWay
column:
colnm <- "formOfWay"
table (os_roads_bristol [[colnm]])
##
## Collapsed Dual Carriageway Dual Carriageway
## 14 6
## Single Carriageway Slip Road
## 1 8
wts <- c (0.1, 0.2, 0.8, 1)
names (wts) <- unique (os_roads_bristol [[colnm]])
net <- weight_streetnet (os_roads_bristol, wt_profile = wts,
type_col = colnm, id_col = "identifier")
The resultant net
object contains the street network of os_roads_bristol
weighted by the specified profile, and in a format suitable for submission to any dodgr
routine.
The dodgr
packages includes a function to select a random connected portion of graph including a specified number of vertices. This function is used in the compare_heaps()
function described below, but is also useful for general statistical analyses of large graphs which may otherwise take too long to compute.
graph_sub <- dodgr_sample (graph, nverts = 100)
nrow (graph_sub)
## [1] 197
The random sample has around twice as many edges as vertices, in accordance with the statistics calculated above.
nrow (dodgr_vertices (graph_sub))
## [1] 100
dodgr_dists()
As demonstrated at the outset, an entire network can simply be submitted to dodgr_dists()
, in which case a square matrix will be returned containing pair-wise distances between all vertices. Doing that for the graph
of York will return a square matrix of around 90,000-times-90,000 (or 8 billion) distances. It might be possible to do that on some computers, but is possibly neither recommended nor desirable. The dodgr_dists()
function accepts additional arguments of from
and to
defining points from and to which distances are to be calculated. If only from
is provided, a square matrix is returned of pair-wise distances between all listed points.
For spatial graphs—that is, those containing columns of latitudes and longitudes (or “x” and “y”)—routing points can be represented by a simple matrix of arbitrary latitudes and longitudes (or, again, “x” and “y”). dodgr_dists()
will map these points to the closest network points, and return corresponding shortest-path distances. This may be illustrated by generating random points within the bounding box of the above map of Hampi. As demonstrated above, the coordinates of all vertices may be extracted with the dodgr_vertices()
function, enabling random points to be generated with the following lines:
vt <- dodgr_vertices (graph)
n <- 100 # number of points to generate
xy <- data.frame (x = min (vt$x) + runif (n) * diff (range (vt$x)),
y = min (vt$y) + runif (n) * diff (range (vt$y)))
Submitting these to dodgr_dists()
as points from which to route will generate a distance matrix from each of these 100 points to every other point in the graph:
d <- dodgr_dists (graph, from = xy)
dim (d); range (d, na.rm = TRUE)
## [1] 100 2861
## [1] 0.00 28744.25
If the to
argument is also specified, the matrix returned will have rows matching from
and columns matching to
d <- dodgr_dists (graph, from = xy, to = xy [1:10, ])
dim (d)
## [1] 100 10
Some of the resultant distances in the above cases are NA
because the points were sampled from the entire bounding box, and the street network near the boundaries may be cut off from the rest. As demonstrated above, the weight_streetnet()
function returns a component
vector, and such disconnected edges will have graph$component > 1
, because graph$component == 1
always denotes the largest connected component. This means that the graph can always be reduced to the single largest component with the following single line:
graph_connected <- graph [graph$component == 1, ]
A distance matrix obtained from running dodgr_dists
on graph_connected
should generally contain no NA
values, although some points may still be effectively unreachable due to one-way connections (or streets). Thus, routing on the largest connected component of a directed graph ought to be expected to yield the minimal number of NA
values, which may sometimes be more than zero. Note further that spatial routing points (expressed as from
and/or to
arguments) will in this case be mapped to the nearest vertices of graph_connected
, rather than the potentially closer nearest points of the full graph
. This may make the spatial mapping of routing points less accurate than results obtained by repeating extraction of the street network using an expanded bounding box. For automatic extraction of street networks with dodgr_dists()
, the extent by which the bounding box exceeds the range of routing points (from
and to
arguments) is determined by an extra parameter expand
, quantifying the relative extent to which the bounding box should exceed the spatial range of the routing points. This is illustrated in the following code which calculates distances between 100 random points:
bb <- osmdata::getbb ("york uk")
npts <- 100
xy <- apply (bb, 1, function (i) min (i) + runif (npts) * diff (i))
routed_points <- function (expand = 0, pts)
{
gr0 <- dodgr_streetnet (pts = pts, expand = expand) %>%
weight_streetnet ()
d0 <- dodgr_dists (gr0, from = pts)
length (which (is.na (d0))) / length (d0)
}
vapply (c (0, 0.05, 0.1), function (i) routed_points (i, pts = xy),
numeric (1))
## [1] 0.0780 0.0586 0.0000
With a street network that precisely encompasses the submitted routing points (expand = 0
), 7.8% of pairwise distances are unable to be calculated; with a bounding box expanded to 5% larger than the submitted points, this is reduced to 5.9%, and with expansion to 10%, all points can be connected.
For non-spatial graphs, from
and to
must match precisely on to vertices named in the graph itself. In the graph considered above, these vertex names were contained in the columns, from_id
and to_id
. The minimum that a dodgr
graph requires is,
head (graph [, names (graph) %in% c ("from_id", "to_id", "d")])
## from_id to_id d
## 1 339318500 339318502 132.442169
## 2 339318502 339318500 132.442169
## 3 339318502 2398958028 8.888670
## 4 2398958028 339318502 8.888670
## 5 2398958028 1427116077 9.326536
## 6 1427116077 2398958028 9.326536
in which case the from
values submitted to dodgr_dists()
(and to
, if given) must directly name the vertices in the from_id
and to_id
columns of the graph. This is illustrated in the following code:
graph_min <- graph [, names (graph) %in% c ("from_id", "to_id", "d")]
fr <- sample (graph_min$from_id, size = 10) # 10 random points
to <- sample (graph_min$to_id, size = 20)
d <- dodgr_dists (graph_min, from = fr, to = to)
dim (d)
## [1] 10 20
The result is a 10-by-20 matrix of distances between these named graph vertices.
dodgr
uses an internal library (Saunders and Takaoka 2003, @Saunders2004) for the calculation of shortest paths using a variety of priority queues (see Miller 1960 for an overview). In the context of shortest paths, priority queues determine the order in which a graph is traversed (Tarjan 1983), and the choice of priority queue can have a considerable effect on computational efficiency for different kinds of graphs (Johnson 1977). In contrast to dodgr
, most other R packages for shortest path calculations do not use priority queues, and so may often be less efficient. Shortest path distances can be calculated in dodgr
with priority queues that use the following heaps:
Differences in how these heaps operate are often largely extraneous to direct application of routing algorithms, even though heap choice may strongly affect performance. To avoid users needing to know anything about algorithmic details, dodgr
provides a function compare_heaps
to which a particular graph may be submitted in order to determine the optimal kind of heap.
The comparisons are actually made on a randomly selected sub-component of the graph containing a defined number of vertices (with a default of 1,000, or the entire graph if it contains fewer than 1,000 vertices).
compare_heaps (graph, nverts = 100, replications = 1)
## test
## 2 d <- dodgr_dists(graph, from = from_id, to = to_id, heap = "FHeap")
## 3 d <- dodgr_dists(graph, from = from_id, to = to_id, heap = "TriHeap")
## 5 d <- dodgr_dists(graph, from = from_id, to = to_id, heap = "Heap23")
## 6 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "BHeap")
## 7 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "FHeap")
## 8 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "TriHeap")
## 9 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "TriHeapExt")
## 10 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "Heap23")
## 11 d <- dodgr_dists(graph_contracted, from = from_id, to = to_id, heap = "set")
## 12 d <- igraph::distances(igr, v = from_id, to = to_id, mode = "out")
## 1 d <- dodgr_dists(graph, from = from_id, to = to_id, heap = "BHeap")
## 4 d <- dodgr_dists(graph, from = from_id, to = to_id, heap = "TriHeapExt")
## replications elapsed relative user.self sys.self user.child sys.child
## 2 1 0.002 1.0 0.004 0.000 0 0
## 3 1 0.002 1.0 0.004 0.000 0 0
## 5 1 0.002 1.0 0.004 0.000 0 0
## 6 1 0.002 1.0 0.000 0.003 0 0
## 7 1 0.002 1.0 0.003 0.000 0 0
## 8 1 0.002 1.0 0.003 0.000 0 0
## 9 1 0.002 1.0 0.002 0.000 0 0
## 10 1 0.002 1.0 0.003 0.000 0 0
## 11 1 0.002 1.0 0.003 0.000 0 0
## 12 1 0.002 1.0 0.002 0.000 0 0
## 1 1 0.003 1.5 0.004 0.001 0 0
## 4 1 0.003 1.5 0.003 0.000 0 0
The key column of that data.frame
is relative
, which quantifies the relative performance of each test in relation to the best which is given a score of 1. dodgr
using the default heap = "BHeap"
, which is a binary heap priority queue, performs faster than igraph
(Csardi and Nepusz 2006) for these graphs. Different kind of graphs will perform differently with different priority queue structures, and this function enables users to empirically discern the optimal heap for their kind of graph.
Note, however, that this is not an entirely fair comparison, because dodgr
calculates dual-weighted distances, whereas igraph
—and indeed all other R packages—only directly calculate distances based on a single set of weights. Implementing dual-weighted routing in these cases requires explicitly re-tracing all paths and summing the second set of weights along each path. A time comparison in that case would be very strongly in favour of dodgr
. Moreover, dodgr
can convert graphs to contracted form through removing redundant vertices, as detailed in the following section. Doing so greatly improves performance with respect to igraph
.
For those wishing to do explicit comparisons themselves, the following code generates the igraph
equivalent of dodgr_dists()
, although of course for single-weighted graphs only:
edges <- cbind (graph$from_id, graph$to_id)
pts <- sample (unique (as.vector (edges)), 100) # set of random routing points
edges <- as.vector (t (edges))
igr <- igraph::make_directed_graph (edges)
igraph::E (igr)$weight <- graph$d
d <- igraph::distances (igr, v = pts, to = pts, mode = "out")
A further unique feature of dodgr
is the ability to remove redundant vertices from graphs (see Fig. 3), thereby speeding up routing calculations.
Figure 3: (A) Vertex#2 is redundant for routing calculations from anywhere other than that point itself. (B) Equivalent graph to (A) contracted through removal of vertex#2. (C) Vertex#2 is not redundant.
In Fig. 3(A), the only way to get from vertex 1 to 3, 4 or 5 is through C. The intermediate vertex B is redundant for routing purposes (and than to or from that precise point) and may simply be removed, with directional edges inserted directly between vertices 1 and 3. This yields the equivalent contracted graph of Fig. 3(B), in which, for example, the distance (or weight) between 1 and 3 is the sum of previous distances (or weights) between 1 \(\to\) 2 and 2 \(\to\) 3. Note that if one of the two edges between, say, 3 and 2 were removed, vertex 2 would no longer be redundant (Fig. 3(C)).
Different kinds of graphs have different degrees of redundancy, and even street networks differ, through for example dense inner-urban networks generally being less redundant than less dense extra-urban or rural networks. The contracted version of a graph can be obtained with the function dodgr_compact_graph()
, illustrated here with the York example from above.
grc <- dodgr_contract_graph (graph)$graph
The function dodgr_compact_graph()
returns a list with the compact graph and a table relating the contracted edges to the corresponding original edges in the full graph. Relative sizes are
nrow (graph); nrow (grc); nrow (grc) / nrow (graph)
## [1] 5845
## [1] 692
## [1] 0.1183918
equivalent to the removal of around 90% of all edges. The difference in routing efficiency can then be seen with the following code
from <- sample (grc$from_id, size = 100)
to <- sample (grc$to_id, size = 100)
rbenchmark::benchmark (
d2 <- dodgr_dists (grc, from = from, to = to),
d2 <- dodgr_dists (graph, from = from, to = to),
replications = 2)
## test replications elapsed
## 2 d2 <- dodgr_dists(graph, from = from, to = to) 2 0.025
## 1 d2 <- dodgr_dists(grc, from = from, to = to) 2 0.007
## relative user.self sys.self user.child sys.child
## 2 3.571 0.058 0.000 0 0
## 1 1.000 0.013 0.001 0 0
And contracting the graph has a similar effect of speeding up pairwise routing between these 100 points. All routing algorithms scale non-linearly with size, and relative improvements in efficiency will be even greater for larger graphs.
Routing is often desired between defined points, and these points may inadvertently be removed in graph contraction. The dodgr_contract_graph()
function accepts an additional argument specifying vertices to keep within the contracted graph. This list of vertices must directly match the vertex ID values in the graph.
The following code illustrates how to retain specific vertices within contracted graphs:
grc <- dodgr_contract_graph (graph)$graph
nrow (grc)
## [1] 692
verts <- sample (dodgr_vertices (graph)$id, size = 100)
head (verts) # a character vector
## [1] "1204772726" "2608163883" "3697942891" "3921512081" "1143452410"
## [6] "2627476985"
grc <- dodgr_contract_graph (graph, verts)$graph
nrow (grc)
## [1] 867
Retaining the nominated vertices yields a graph with considerably more edges than the fully contracted graph excluding these vertices. The dodgr_dists()
function can be applied to the latter graph to obtain accurate distances precisely routed between these points, yet using the speed advantages of graph contraction.
Shortest paths can also be extracted with the dodgr_paths()
function. For given vectors of from
and to
points, this returns a nested list so that if,
dp <- dodgr_paths (graph, from = from, to = to)
then dp [[i]] [[j]]
will contain the path from from [i]
to to [j]
. The paths are represented as sequences of vertex names. Consider the following example,
graph <- weight_streetnet (hampi)
head (graph)
## geom_num edge_id from_id from_lon from_lat to_id to_lon
## 1 1 1 339318500 76.47489 15.34169 339318502 76.47612
## 2 1 2 339318502 76.47612 15.34173 339318500 76.47489
## 3 1 3 339318502 76.47612 15.34173 2398958028 76.47621
## 4 1 4 2398958028 76.47621 15.34174 339318502 76.47612
## 5 1 5 2398958028 76.47621 15.34174 1427116077 76.47628
## 6 1 6 1427116077 76.47628 15.34179 2398958028 76.47621
## to_lat d d_weighted highway way_id component time
## 1 15.34173 132.442169 147.15797 service 28565950 1 31.786121
## 2 15.34169 132.442169 147.15797 service 28565950 1 31.786121
## 3 15.34174 8.888670 9.87630 service 28565950 1 2.133281
## 4 15.34173 8.888670 9.87630 service 28565950 1 2.133281
## 5 15.34179 9.326536 10.36282 service 28565950 1 2.238369
## 6 15.34174 9.326536 10.36282 service 28565950 1 2.238369
## time_weighted
## 1 35.317912
## 2 35.317912
## 3 2.370312
## 4 2.370312
## 5 2.487076
## 6 2.487076
The columns of from_id
and to_id
contain the names of the vertices. To extract shortest paths between some of these, first take some small samples of from
and to
points, and submit them to dodgr_paths()
:
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 5)
dp <- dodgr_paths (graph, from = from, to = to)
length (dp)
## [1] 10
The result (dp
) is a list of 10 items, each of which contains 5 vectors. An example is,
dp [[1]] [[1]]
## [1] "1376769039" "1376768690" "1376768324" "1376768985" "1376768626"
## [6] "339574829" "339577274" "1148815093" "339577272" "339574837"
## [11] "339577276" "339577277" "2717241325" "339577279" "339577280"
## [16] "339577283" "1427080647" "1427080646" "339577285" "339577286"
## [21] "339577287" "339577289" "339577292" "338905032" "1376768330"
## [26] "1376768904" "2398957715" "2398957716" "1376768543" "1376769137"
## [31] "1376768847" "1376768484" "1376769045" "4506481428" "1376768697"
## [36] "1376769196" "1376768817" "1376768447" "1376769017" "1376768754"
## [41] "1376768386" "1376768961" "1376768601" "1376769087" "1376768725"
## [46] "1376768356" "1376768931" "1376768660" "1376768299" "1376768873"
## [51] "1376768511" "2398957609" "286632880" "2398957591" "2398957587"
## [56] "676635811" "676635812" "2398957556" "676635813" "2398957540"
## [61] "2398957538" "2398957533" "2398957527" "2398957524" "338512647"
## [66] "2398957514" "676635814" "2398957503" "2398957496" "2398957492"
## [71] "286632879" "676635815" "2398957483"
For spatial graphs, the coordinates of these paths can be obtained by extracting the vertices with dodgr_vertices()
and matching the vertex IDs:
verts <- dodgr_vertices (graph)
path1 <- verts [match (dp [[1]] [[1]], verts$id), ]
head (path1)
## id x y component n
## 2956 1376769039 76.46914 15.32207 1 1484
## 2954 1376768690 76.46928 15.32176 1 1483
## 2952 1376768324 76.46949 15.32110 1 1482
## 2950 1376768985 76.46970 15.32073 1 1481
## 2948 1376768626 76.46992 15.32061 1 1480
## 556 339574829 76.47003 15.32047 1 283
Paths calculated on contracted graphs will of course have fewer vertices than those calculated on full graphs.
The final major functions of dodgr
are dodgr_flows_aggregate()
and dodgr_flows_disperse()
.
The former of the above functions aggregates ‘’flows’’ throughout a network from a set of origin (from
) and destination (to
) points. Flows commonly arise in origin-destination matrices used in transport studies, but may be any kind of generic flows on graphs. A flow matrix specifies the flow between each pair of origin and destination points, and the dodgr_flows_aggregate()
function aggregates all of these flows throughout a network and assigns a resultant aggregate flow to each edge.
For a set of nf
points of origin and nt
points of destination, flows are defined by a simple nf
-by-nt
matrix of values, as in the following code:
graph <- weight_streetnet (hampi)
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 10)
flows <- matrix (10 * runif (length (from) * length (to)),
nrow = length (from))
This flows
matrix is then submitted to dodgr_flows_aggregate()
, which simply appends an additional column of flows
to the submitted graph
:
graph_f <- dodgr_flows_aggregate (graph, from = from, to = to, flows = flows)
head (graph_f)
## geom_num edge_id from_id from_lon from_lat to_id to_lon
## 1 1 1 339318500 76.47489 15.34169 339318502 76.47612
## 2 1 2 339318502 76.47612 15.34173 339318500 76.47489
## 3 1 3 339318502 76.47612 15.34173 2398958028 76.47621
## 4 1 4 2398958028 76.47621 15.34174 339318502 76.47612
## 5 1 5 2398958028 76.47621 15.34174 1427116077 76.47628
## 6 1 6 1427116077 76.47628 15.34179 2398958028 76.47621
## to_lat d d_weighted highway way_id component time
## 1 15.34173 132.442169 147.15797 service 28565950 1 31.786121
## 2 15.34169 132.442169 147.15797 service 28565950 1 31.786121
## 3 15.34174 8.888670 9.87630 service 28565950 1 2.133281
## 4 15.34173 8.888670 9.87630 service 28565950 1 2.133281
## 5 15.34179 9.326536 10.36282 service 28565950 1 2.238369
## 6 15.34174 9.326536 10.36282 service 28565950 1 2.238369
## time_weighted flow
## 1 35.317912 0
## 2 35.317912 0
## 3 2.370312 0
## 4 2.370312 0
## 5 2.487076 0
## 6 2.487076 0
Most flows are zero because they have only been calculated between very few points in the graph.
summary (graph_f$flow)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 5.742 0.000 68.576
The second function, dodgr_flows_disperse()
, uses only a vector a origin (from
) points, and aggregates flows as they disperse throughout the network according to a simple exponential model. In place of the matrix of flows required by dodgr_flows_aggregate()
, dispersal requires an equivalent vector of densities dispersing from all origin (from
) points. This is illustrated in the following code, using the same graph as the previous example.
dens <- rep (1, length (from)) # uniform densities
graph_f <- dodgr_flows_disperse (graph, from = from, dens = dens)
summary (graph_f$flow)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000000 0.0000000 0.0000000 0.0001727 0.0000000 0.4903031
Note that flows from both dodgr_flows_aggregate()
and dodgr_flows_disperse()
are directed, so the flow from ‘A’ to ‘B’ will not necessarily equal the flow from ‘B’ to ‘A’. It is often desirable to aggregate flows in an undirected manner, for example for visualisations where plotting pairs of directed flows between each edge if often not feasible for large graphs. Directed flows can be aggregated to equivalent undirected flows with the merge_directed_flows()
function:
graph_undir <- merge_directed_flows (graph_f)
Resultant graphs produced by merge_directed_flows()
only include those edges having non-zero flows, and so:
nrow (graph_f); nrow (graph_undir) # the latter is much smaller
## [1] 5845
## [1] 1644
The resultant graph can readily be merged with the original graph to regain the original data on vertex coordinates through
graph <- graph [graph_undir$edge_id, ]
graph$flow <- graph_undir$flow
This graph may then be used to visalise flows with the dodgr_flowmap
function:
graph_f <- graph_f [graph_f$flow > 0, ]
dodgr_flowmap (graph_f, linescale = 5)
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.org.
Fredman, Michael L., and Robert Endre Tarjan. 1987. “Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms.” J. ACM 34 (3): 596–615. https://doi.org/10.1145/28869.28874.
Johnson, Donald B. 1977. “Efficient Algorithms for Shortest Paths in Sparse Networks.” J. ACM 24 (1): 1–13. https://doi.org/10.1145/321992.321993.
Miller, Rupert G. 1960. “Priority Queues.” The Annals of Mathematical Statistics 31 (1): 86–103. https://www.jstor.org/stable/2237496.
Saunders, Shane. 2004. “Improved Shortest Path Algorithms for Nearly Acyclic Graphs.” PhD thesis, Department of Computer Science; Software Engineering, University of Canterbury, New Zealand.
Saunders, S., and T. Takaoka. 2003. “Improved Shortest Path Algorithms for Nearly Acyclic Graphs.” Theoretical Computer Science 293 (3): 535–56.
Takaoka, Tadao. 1999. “Theory of 2-3 Heaps.” In Computing and Combinatorics: 5th Annual International Conference, Cocoon’99 Tokyo, Japan, July 26–28, 1999 Proceedings, edited by Takano Asano, Hideki Imai, D. T. Lee, Shin-ichi Nakano, and Takeshi Tokuyama, 41–50. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-48686-0_4.
———. 2000. “Theory of Trinomial Heaps.” In Computing and Combinatorics: 6th Annual International Conference, Cocoon 2000 Sydney, Australia, July 26–28, 2000 Proceedings, edited by Ding-Zhu Du, Peter Eades, Vladimir Estivill-Castro, Xuemin Lin, and Arun Sharma, 362–72. Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44968-X_36.
Tarjan, R. 1983. Data Structures and Network Algorithms. Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611970265.