The R package rdwd
, available at github.com/brry, contains code to select, download and read weather data from measuring stations across Germany. The German Weather Service (Deutscher Wetterdienst, DWD) provides over 25 thousand datasets with weather observations through the FTP server online at
To use those datasets, rdwd
has been designed to mainly do 3 things:
selectDWD
: facilitate file selection, e.g. for certain station names (with findID
), by geographical location (see mapDWD
), by temporal resolution (hourly, daily, monthly), variables (temperature, rain, wind, sun, clouds, etc) or observation period (historical long term records or the current year)
dataDWD
: download a file (or multiple files without getting banned by the FTP-server)
readDWD
: read that data into R (including useful defaults for metadata)
selectDWD
uses the result from indexDWD
which recursively lists all the files on an FTP-server (using RCurl::getURL). As this is time consuming, the result is stored in the package dataset fileIndex
. From this, metaIndex
, geoIndex
, mapDWD
and metaInfo
are derived.
install.packages("rdwd")
# get the latest development version from github:
berryFunctions::instGit("brry/rdwd")
# For full usage, as needed in indexDWD and metaDWD(..., current=TRUE):
install.packages("RCurl") # is only suggested, not mandatory dependency
library(rdwd)
If direct installation from CRAN doesn’t work, your R version might be too old. In that case, an update is really recommendable: r-project.org. If you can’t update R, try installing from source (github) via instGit
as mentioned above. If that’s not possible either, you might be able to source
some functions from the package zip folder
Vectorize(source)(dir("path/you/unzipped/to/rdwd-master/R", full=T))
tdir <- tempdir()
link <- selectDWD("Potsdam", res="daily", var="kl", per="recent")
file <- dataDWD(link, read=FALSE, dir=tdir, quiet=TRUE)
clim <- readDWD(file, dir=tdir)
str(clim)
## 'data.frame': 550 obs. of 18 variables:
## $ STATIONS_ID : int 3987 3987 3987 3987 3987 3987 3987 3987 3987 3987 ...
## $ MESS_DATUM : POSIXct, format: "2015-08-02" "2015-08-03" ...
## $ QUALITAETS_NIVEAU : int 3 3 3 3 3 3 3 3 3 3 ...
## $ LUFTTEMPERATUR : num 22.4 23.8 25.8 20.6 25.2 27.8 24.5 22.5 25.3 25.8 ...
## $ DAMPFDRUCK : num 11.7 13.3 15.7 15.4 15.8 17.4 18.6 15.3 17.7 19.1 ...
## $ BEDECKUNGSGRAD : num 4.5 2 2.9 4.9 3.6 3.4 4.4 2.3 3.3 4 ...
## $ LUFTDRUCK_STATIONSHOEHE : num 1007 1006 1002 1007 1004 ...
## $ REL_FEUCHTE : num 46.8 49.4 52.4 66.2 54.1 ...
## $ WINDGESCHWINDIGKEIT : num 3 3 5 3.4 3.4 4 4.3 3.5 3.8 3.8 ...
## $ LUFTTEMPERATUR_MAXIMUM : num 30 32.3 35.3 26.3 34.6 37.6 33.3 29.5 33.5 33.1 ...
## $ LUFTTEMPERATUR_MINIMUM : num 15.1 14 18.4 16.4 16.1 21.2 19.2 16.6 17.1 19.4 ...
## $ LUFTTEMP_AM_ERDB_MINIMUM: num 11.6 11.7 16.1 14.9 13.5 18.1 17.8 15.5 16.1 18.7 ...
## $ WINDSPITZE_MAXIMUM : num 8.1 9.2 17.3 9.1 9.6 9.1 12.5 8.2 8.4 11.7 ...
## $ NIEDERSCHLAGSHOEHE : num 0 0 4.1 0 0 0 0.1 0 0 0 ...
## $ NIEDERSCHLAGSHOEHE_IND : int 0 0 6 0 0 0 6 0 0 0 ...
## $ SONNENSCHEINDAUER : num 13.4 14.4 11.6 10.7 13.3 ...
## $ SCHNEEHOEHE : int 0 0 0 0 0 0 0 0 0 0 ...
## $ eor : Factor w/ 1 level "eor": 1 1 1 1 1 1 1 1 1 1 ...
Recent temperature time series:
par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8)
plot(clim[,c(2,4)], type="l", xaxt="n", las=1, main="Daily temp Potsdam")
berryFunctions::monthAxis(ym=TRUE) ; abline(h=0)
mtext("Source: Deutscher Wetterdienst", adj=-0.1, line=0.5, font=3)
Long term climate graph:
link <- selectDWD("Potsdam", res="monthly", var="kl", per="h")
clim <- dataDWD(link, quiet=TRUE)
clim$month <- substr(clim$MESS_DATUM_BEGINN,5,6)
temp <- tapply(clim$LUFTTEMPERATUR, clim$month, mean)
prec <- tapply(clim$NIEDERSCHLAGSHOEHE, clim$month, mean)
library(berryFunctions)
climateGraph(temp, prec, main="Potsdam 1893:2015")
mtext("Source: Deutscher Wetterdienst", adj=-0.05, line=2.8, font=3)
Weather stations can be selected geographically with the interactive map.
The DWD station IDs can be obtained from station names with
findID("Potsdam")
## Potsdam
## 3987
findID("Koeln", exactmatch=FALSE)
## Warning: in rdwd::findID: ID determined from name 'Koeln' has 4 elements
## (2665, 2666, 2667, 2968).
## Koeln-Bonn Koeln-Botanischer Garten Koeln-Porz-Eil
## 2667 2665 2666
## Koeln-Stammheim
## 2968
File selection by station name/id and folder happens with selectDWD
. It needs an index of all the available files on the server. The package contains such an index (fileIndex
) that is updated (at least) with each CRAN release of the package. The selectDWD
function documentation contains an overview of the FTP folder structure.
If you find rdwd:::fileIndex
to be outdated (Error in download.file … : cannot open URL), please let me know and I will update it. Meanwhile, use current=TRUE in selectDWD
:
# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)
fileIndex
is created with the function indexDWD
used in meta.R.
# recursively list files on the FTP-server:
files <- indexDWD("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)
# with other FTP servers, this should also work...
funet <- indexDWD(base="ftp.funet.fi/pub/standards/RFC/ien", folder="")
p <- RCurl::getURL("ftp.funet.fi/pub/standards/RFC/ien/",
verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)
selectDWD
is designed to be very flexible:
# inputs can be vectorized, and period can be abbreviated:
selectDWD(c("Potsdam","Wuerzburg"), res="hourly", var="sun", per="hist")
## [[1]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_03987_18930101_20151231_hist.zip"
##
## [[2]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_05705_19510101_20151231_hist.zip"
# Time period can be doubled to get both filenames:
selectDWD("Potsdam", res="daily", var="kl", per="rh", outvec=TRUE)
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/daily/kl/recent/ tageswerte_KL_03987_akt.zip"
## [2] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/daily/kl/historical/ tageswerte_03987_18930101_20151231_hist.zip"
There may be a differing number of available files for several stations across all folders. That’s why the default outvec is FALSE.
lapply(selectDWD(id=c(3467,5116)), substr, 58, 1e4)
## Warning: in rdwd::selectDWD: in file index 'fileIndex', there are 4 files
## with ID 3467.
## Warning: in rdwd::selectDWD: in file index 'fileIndex', there are 2 files
## with ID 5116.
## [[1]]
## [1] "/daily/more_precip/historical/tageswerte_RR_03467_19930601_20151231_hist.zip"
## [2] "/daily/more_precip/recent/tageswerte_RR_03467_akt.zip"
## [3] "/monthly/more_precip/historical/monatswerte_RR_03467_19930601_20151231_hist.zip"
## [4] "/monthly/more_precip/recent/monatswerte_RR_03467_akt.zip"
##
## [[2]]
## [1] "/daily/more_precip/historical/tageswerte_RR_05116_19930101_20061231_hist.zip"
## [2] "/monthly/more_precip/historical/monatswerte_RR_05116_19920701_20061231_hist.zip"
selectDWD
also uses a complete data.frame with meta information, metaIndex
(derived from the “Beschreibung” files in fileIndex
).
# All metadata at all folders:
data(metaIndex)
str(metaIndex, vec.len=2)
## 'data.frame': 36124 obs. of 12 variables:
## $ Stations_id : int 1 1 1 1 1 ...
## $ von_datum : int 18910101 19120101 19120101 19120101 19310101 ...
## $ bis_datum : int 19860630 19860630 19860630 19860630 19860630 ...
## $ Stationshoehe: num 478 478 478 478 478 ...
## $ geoBreite : num 47.8 47.8 ...
## $ geoLaenge : num 8.85 8.85 ...
## $ Stationsname : chr "Aach" "Aach" ...
## $ Bundesland : chr "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
## $ res : chr "monthly" "daily" ...
## $ var : chr "more_precip" "more_precip" ...
## $ per : chr "recent" "historical" ...
## $ hasfile : logi FALSE TRUE FALSE ...
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # 5831 entries
dataDWD
can download (and readDWD
can correctly read) such a data.frame from any folder on the FTP server:
# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
substr(m_link, 50, 1e4) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "/climate/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
meta_monthly_rain <- dataDWD(m_link, dir=tdir) # not executed in vignette creation
str(meta_monthly_rain)
Meta files may list stations for which there are actually no files. For example: Tucheim (5116) is listed in the metadata at …/monthly/more_precip/recent/RR_Monatwerte_Beschreibung_Stationen.txt, but actually has no file in that folder (only in …/monthly/more_precip/historical).
Any feedback on this package (or this vignette) is very welcome via github or berry-b@gmx.de!