cbsodataR, all data of Statistics Netherlands (CBS)

Edwin de Jonge

2019-02-21

Statistics Netherlands (CBS) is the office that produces all official statistics of the Netherlands.

For long SN has put its data on the web in its online database StatLine. Since 2014 this data base has an open data web API based on the OData protocol. The cbsodataR package allows for retrieving data right into R.

Table of Contents

A list of tables can be retrieved using the cbs_get_toc function.

library(dplyr) # not needed, but used in examples below
library(cbsodataR)

toc <- cbs_get_toc(Language="en") # retrieve only english tables

toc %>% 
  select(Identifier, ShortTitle) 
## # A tibble: 732 x 2
##    Identifier ShortTitle                             
##    <chr>      <chr>                                  
##  1 80783eng   Agriculture; general farm type, region 
##  2 80784eng   Agriculture; labour force, region      
##  3 7100eng    Arable crops; production               
##  4 70671ENG   Fruit culture; area fruit orchards     
##  5 37738ENG   Vegetables; yield per kind of vegetable
##  6 71509ENG   Yield apples and pears                 
##  7 83981ENG   Livestock manure; key figures          
##  8 80274eng   Livestock cattle                       
##  9 7373eng    Livestock pigs                         
## 10 7425eng    Milk supply and dairy production       
## # … with 722 more rows

Using an “Identifier” from cbs_get_toc information on the table can be retrieved with cbs_get_meta

apples <- cbs_get_meta('71509ENG')
apples
## 71509ENG: 'Yield apples and pears', 2017
##   FruitFarmingRegions: 'Fruit farming regions'
##   Periods: 'Periods' 
## 
## Retrieve a default data selection with:
##  cbs_get_data(id = "71509ENG", FruitFarmingRegions = c("1", "2", 
## "4", "3", "5"), Periods = c("1997JJ00", "2012JJ00", "2013JJ00", 
## "2016JJ00"), select = c("FruitFarmingRegions", "Periods", "TotalAppleVarieties_1", 
## "CoxSOrangePippin_2", "DelbarestivaleDelcorf_3", "Elstar_4", 
## "GoldenDelicious_5", "Jonagold_6", "Jonagored_7", "RodeBoskoopRennetApple_10", 
## "OtherAppleVarieties_12", "TotalPearVarieties_13", "Conference_15", 
## "DoyenneDuComice_16", "CookingPears_17", "TriompheDeVienne_18", 
## "OtherPearVarieties_19", "TotalAppleVarieties_20", "CoxSOrangePippin_21", 
## "DelbarestivaleDelcorf_22", "Elstar_23", "GoldenDelicious_24", 
## "Jonagold_25", "Jonagored_26", "RodeBoskoopRennetApple_29", "OtherAppleVarieties_31", 
## "TotalPearVarieties_32", "Conference_34", "DoyenneDuComice_35", 
## "CookingPears_36", "TriompheDeVienne_37", "OtherPearVarieties_38"
## ))

The meta object contains all metadata properties of cbsodata (see the original documentation) in the form of data.frames. Each data.frame describes properties of the SN table.

names(apples)
## [1] "TableInfos"          "DataProperties"      "CategoryGroups"     
## [4] "FruitFarmingRegions" "Periods"

Data download

With cbs_get_data data can be retrieved. By default all data for this table will be downloaded in a temporary directory.

cbs_get_data('71509ENG') %>% 
  select(1:4) %>%  # demonstration purpose
  head()
## # A tibble: 6 x 4
##   FruitFarmingRegions Periods  TotalAppleVarieties_1 CoxSOrangePippin_2
##   <chr>               <chr>                    <int>              <int>
## 1 1                   1997JJ00                   420                 43
## 2 1                   1998JJ00                   518                 40
## 3 1                   1999JJ00                   568                 39
## 4 1                   2000JJ00                   461                 27
## 5 1                   2001JJ00                   408                 30
## 6 1                   2002JJ00                   354                 17

Select and filter

It is possible restrict the download using filter statements. This may shorten the download time considerably.

  cbs_get_data('71509ENG', Periods=c('2000JJ00','2001JJ00')) %>% 
  select(1:4) %>% 
  head()
## # A tibble: 6 x 4
##   FruitFarmingRegions Periods  TotalAppleVarieties_1 CoxSOrangePippin_2
##   <chr>               <chr>                    <int>              <int>
## 1 1                   2000JJ00                   461                 27
## 2 1                   2001JJ00                   408                 30
## 3 2                   2000JJ00                    87                  5
## 4 2                   2001JJ00                    75                  5
## 5 4                   2000JJ00                   105                 10
## 6 4                   2001JJ00                    87                  9

Download data

Data can also be downloaded explicitly by using cbs_download_table