Load the package other packages used in this vignette.
suppressPackageStartupMessages(library("dplyr"))
library("geysertimes")
The gt_get_data
function downloads the compressed eruptions data from https://geysertimes.org/archive/
, reads the data compressed data into R and saves version of the R object in the location specified in the dest_folder
argument to the function. The default location for dest_folder
is file.path(tempdir(), "geysertimes"))
. This default location is used to meet the CRAN requirement of not writing files by default to any location other than under tempdir()
.
<- gt_get_data()
default_path #> Set dest_folder to geysertimes::gt_path() so that data persists between R sessions.
#> Warning in read_eruptions_file(downloaded_file): NAs introduced by coercion
default_path#> [1] "/tmp/RtmpOjrQnO/geysertimes/2021-01-13"
Users are encouraged to set dest_folder
to the value given by gt_path()
which is a permanent location appropriate for the user on the particular platform.
gt_path()
#> [1] "/home/spk/.local/share/geysertimes"
If a permanent location is used, the user only needs to get the data once. Using the suggested value for dest_folder
:
<- gt_get_data(dest_folder=gt_path()) suggested_path
suggested_path#> [1] "/home/spk/.local/share/GeyserTimes/2021-01-04"
The gt_load_eruptions
is used to load the eruptions data in the current session. The gt_load_geysers
loads the geyser location data in the current session.
<- gt_load_eruptions()
eruptions <- gt_load_geysers() geysers
A quick look at the data:
dim(eruptions)
#> [1] 1284055 25
names(eruptions)
#> [1] "eruption_id" "geyser" "time"
#> [4] "has_seconds" "exact" "near_start"
#> [7] "in_eruption" "electronic" "approximate"
#> [10] "webcam" "initial" "major"
#> [13] "minor" "questionable" "duration"
#> [16] "duration_seconds" "duration_resolution" "duration_modifier"
#> [19] "entrant" "observer" "comment"
#> [22] "time_updated" "time_entered" "primary_id"
#> [25] "other_comments"
The data that is downloaded is versioned. The version id is the date when the data was downloaded.
The gt_version()
lists the latest version of the data that has been downloaded. Setting all=TRUE
will list all versions of the data that have been downloaded.
gt_version()
#> [1] "2021-01-08"
gt_version(all=TRUE)
#> [1] "2021-01-08"
Geysers with the most recorded eruptions:
print(n=20,
%>% group_by(geyser) %>% summarise(N=n()) %>% arrange(desc(N)))
eruptions #> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 450 x 2
#> geyser N
#> <chr> <int>
#> 1 Old Faithful 182797
#> 2 Plume 143792
#> 3 Daisy 98872
#> 4 Little Cub 96771
#> 5 Lion 63105
#> 6 Grand 41077
#> 7 Aurum 35734
#> 8 Oblong 34075
#> 9 Fountain 27408
#> 10 Spouter 26977
#> 11 Riverside 26535
#> 12 Castle 25575
#> 13 Logbridge 23416
#> 14 Plate 21594
#> 15 Echinus 21330
#> 16 Depression 20646
#> 17 Turban 18839
#> 18 Great Fountain 18767
#> 19 Grotto 17477
#> 20 Jet 16366
#> # … with 430 more rows