rdwd: climate data from the German Weather Service

Berry Boessenkool, berry-b@gmx.de

2019-03-17

Vignette Rmd source code (Not on CRAN to reduce load on DWD server through daily new builds and checks of the vignette)

Interactive map vignette

Intro

The R package rdwd contains code to select, download and read weather data from measuring stations across Germany. The German Weather Service (Deutscher Wetterdienst, DWD) provides over 228 thousand datasets with weather observations through the FTP server online at

ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate.

For data interpolated onto a 1 km raster, including radar data up to the last hour, see ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/. Raster and binary reading functions are included since rdwd 1.1.0 (March 2019), although the latter doesn’t read the radolan binary files completely correct yet, see Development notes.

For further details, please consult the DWD FTP server documentation

Package structure

To use the observation datasets, rdwd has been designed to mainly do 3 things:

selectDWD uses the result from indexFTP which recursively lists all the files on an FTP-server (using RCurl::getURL). As this is time consuming, the result is stored in the package dataset fileIndex. From this, metaIndex and geoIndex are derived.

TOC

Package installation

install.packages("rdwd")
# get the latest development version from github:
berryFunctions::instGit("brry/rdwd") 
# For full usage, as needed in indexFTP and metaDWD(..., current=TRUE):
install.packages("RCurl") # is only suggested, not mandatory dependency

On Linux, instead of the last line above, use in the terminal (with lowercase rcurl):

sudo apt-get install r-cran-rcurl

If direct installation from CRAN doesn’t work, your R version might be too old. In that case it is really recommendable to update R. If you can’t update R, try installing from source (github) via instGit as mentioned above. If that’s not possible either, you might be able to source some functions from the package zip folder

Vectorize(source)(dir("path_you_unzipped_to/rdwd-master/R", full=T))

TOC

Basic usage

library(rdwd)
link <- selectDWD("Potsdam", res="daily", var="kl", per="recent")
file <- dataDWD(link, read=FALSE, dir="DWDdata", quiet=TRUE)
clim <- readDWD(file)

str(clim)
## 'data.frame':    550 obs. of  19 variables:
##  $ STATIONS_ID: int  3987 3987 3987 3987 3987 3987 3987 3987 3987 3987 ...
##  $ MESS_DATUM : POSIXct, format: "2017-09-13" "2017-09-14" ...
##  $ QN_3       : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ FX         : num  21.1 21.2 11.9 6.4 6.6 5.5 6.4 8.7 6.5 5.1 ...
##  $ FM         : num  7.1 5.8 5 3 2 2.7 2.9 3.5 2.5 1.5 ...
##  $ QN_4       : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ RSK        : num  3.7 1.8 0.2 0 1.1 0 0 1.9 0 0 ...
##  $ RSKF       : int  6 6 6 0 6 0 0 6 0 0 ...
##  $ SDK        : num  6.38 2.05 4.93 8.1 8.62 ...
##  $ SHK_TAG    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ NM         : num  4.1 5.8 5.2 3.1 2.7 3.1 3.7 5 6.8 4.2 ...
##  $ VPM        : num  12.3 12.3 12.2 11 11.1 11.1 11.1 11.8 12.7 12.3 ...
##  $ PM         : num  989 989 998 1000 1001 ...
##  $ TMK        : num  13.3 12.3 12.5 11.6 11.8 11.5 11.5 11.8 12.9 13.6 ...
##  $ UPM        : num  81.1 86 84.2 81.8 82 ...
##  $ TXK        : num  18.9 15 17 17.6 18.6 17.2 18 16.9 17.1 19 ...
##  $ TNK        : num  9.1 10 8.9 6.3 6.8 7.2 5.7 7.7 10 10.6 ...
##  $ TGK        : num  6.7 8.3 5.8 3.9 3 4.2 3.3 4.5 8 6.4 ...
##  $ eor        : Factor w/ 1 level "eor": 1 1 1 1 1 1 1 1 1 1 ...

TOC

Plotting examples

Recent temperature time series:

par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8)
plot(clim[,c(2,14)], type="l", xaxt="n", las=1, main="Daily temp Potsdam")
berryFunctions::monthAxis(ym=TRUE)   ;   abline(h=0)
mtext("Source: Deutscher Wetterdienst", adj=-0.1, line=0.5, font=3)

Long term climate graph:

link <- selectDWD("Goettingen", res="monthly", var="kl", per="h")
clim <- dataDWD(link, quiet=TRUE)
clim$month <- substr(clim$MESS_DATUM_BEGINN,5,6)
temp <- tapply(clim$MO_TT, clim$month, mean, na.rm=TRUE)
prec <- tapply(clim$MO_RR, clim$month, mean, na.rm=TRUE)
berryFunctions::climateGraph(temp, prec, main="Goettingen")
mtext("Source: Deutscher Wetterdienst", adj=-0.05, line=2.8, font=3)

TOC

Station selection

Weather stations can be selected geographically with the interactive map. All stations within a certain radius around a given lat-long position can be obtained with nearbyStations.

The DWD station IDs can be obtained from station names with

findID("Potsdam")
## Potsdam 
##    3987
findID("Koeln", exactmatch=FALSE)
## Warning: source -> withVisible -> eval -> eval -> createBerrysVignettes -
## > rmarkdown::render -> knitr::knit -> call_block -> block_exec -> in_dir -
## > evaluate -> evaluate::evaluate -> evaluate_call -> timing_fn -> handle -
## > findID: ID determined from name 'Koeln' has 4 elements (2665, 2666, 2667,
## 2968).
##               Koeln-Bonn Koeln-Botanischer Garten           Koeln-Porz-Eil 
##                     2667                     2665                     2666 
##          Koeln-Stammheim 
##                     2968

TOC

Available files

File selection by station name/id and folder happens with selectDWD. It needs an index of all the available files on the server. The package contains such an index (fileIndex) that is updated (at least) with each CRAN release of the package. The selectDWD documentation contains an overview of the FTP folder structure.

If you find the file index to be outdated (Error in download.file … : cannot open URL), please let me know and I will update it. Meanwhile, use current=TRUE in selectDWD:

# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)

fileIndex is created with the function indexFTP used in the last section of rdwd-package.R.

### This chunk is not evaluated ###
# recursively list files on the FTP-server:
files <- indexFTP("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)

# indexFTP uses a folder to resume indexing after getting banned:
gridindex <- indexFTP("radolan","ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/hourly")
gridindex <- indexFTP(gridindex,"ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/hourly", sleep=1)

# with other FTP servers, this should also work...
funet <- indexFTP(base="ftp.funet.fi/pub/standards/w3/TR/xhtml11/", folder="")
p <- RCurl::getURL(    "ftp.funet.fi/pub/standards/w3/TR/xhtml11/",
                       verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)

TOC

File selection

selectDWD is designed to be very flexible:

# inputs can be vectorized, and period can be abbreviated:
selectDWD(c("Potsdam","Wuerzburg"), res="hourly", var="sun", per="hist")
## [[1]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_03987_18930101_20181231_hist.zip"
## 
## [[2]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_05705_19510101_20181231_hist.zip"

If res/var/per are left NA, an interactive selection is opened with the available options for the given station.

# Time period can be doubled to get both filenames:
selectDWD("Potsdam", res="daily", var="kl", per="rh")
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/ daily/kl/recent/tageswerte_KL_03987_akt.zip"                       
## [2] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/ daily/kl/historical/tageswerte_KL_03987_18930101_20181231_hist.zip"

There may be a differing number of available files for several stations across all folders. That’s why the default outvec is FALSE (unless per="hr").

lapply(selectDWD(id=c(3467,5116), res="",var="",per=""), substr, 58, 1e4)
## [[1]]
## [1] "/annual/more_precip/historical/jahreswerte_RR_03467_19940101_20181231_hist.zip" 
## [2] "/annual/more_precip/recent/jahreswerte_RR_03467_akt.zip"                        
## [3] "/daily/more_precip/historical/tageswerte_RR_03467_19930601_20181231_hist.zip"   
## [4] "/daily/more_precip/recent/tageswerte_RR_03467_akt.zip"                          
## [5] "/monthly/more_precip/historical/monatswerte_RR_03467_19930601_20181231_hist.zip"
## [6] "/monthly/more_precip/recent/monatswerte_RR_03467_akt.zip"                       
## 
## [[2]]
## [1] "/annual/more_precip/historical/jahreswerte_RR_05116_19930101_20181231_hist.zip" 
## [2] "/monthly/more_precip/historical/monatswerte_RR_05116_19920701_20061231_hist.zip"

TOC

Metadata

selectDWD also uses a complete data.frame with meta information, metaIndex (derived from the “Beschreibung” files in fileIndex).

# All metadata at all folders:
data(metaIndex)
str(metaIndex, vec.len=2)
## 'data.frame':    80381 obs. of  12 variables:
##  $ Stations_id  : int  1 1 1 1 1 ...
##  $ von_datum    : int  18910101 18910101 18910101 18910101 19120101 ...
##  $ bis_datum    : int  19860630 19860630 19860630 19860630 19860630 ...
##  $ Stationshoehe: num  478 478 478 478 478 ...
##  $ geoBreite    : num  47.8 47.8 ...
##  $ geoLaenge    : num  8.85 8.85 ...
##  $ Stationsname : chr  "Aach" "Aach" ...
##  $ Bundesland   : chr  "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
##  $ res          : chr  "annual" "annual" ...
##  $ var          : chr  "more_precip" "more_precip" ...
##  $ per          : chr  "historical" "recent" ...
##  $ hasfile      : logi  TRUE FALSE TRUE ...
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # ca 6k entries

dataDWD can download (and readDWD can correctly read) such a data.frame from any folder on the FTP server:

# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
substr(m_link, 50, 1e4) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "/climate/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
meta_monthly_rain <- dataDWD(m_link) # not executed in vignette creation
str(meta_monthly_rain)

Meta files may list stations for which there are actually no files. For example: Tucheim (5116) is listed in the metadata at …/monthly/more_precip/recent/RR_Monatwerte_Beschreibung_Stationen.txt, but actually has no file in that folder (only in …/monthly/more_precip/historical). These refer to nonpublic datasets (The DWD cannot publish all datasets because of copyright restrictions). To request those, please contact klima.vertrieb@dwd.de.

TOC

Any feedback on this package (or this vignette) is very welcome via github or berry-b@gmx.de!