Pooling

Alexander Bauer

2019-07-15

library(tidyr)
library(purrr)
library(dplyr)
library(coalitions)

In this vignette we demonstrate the offered functionality for pooling multiple surveys.

Overview

We offer convenience functions that allow for easily performing the pooling approach:

  1. get_surveys: Wrapper that uses scrape_wahlrecht and collapse_parties to download the most current survey results from https://www.wahlrecht.de/ and stores the prepared data inside a nested tibble (see tidyr::nest)

  2. pool_surveys: Pool all newest surveys (obtained with get_surveys), using a specified time window that defaults to the last 14 days, and assuming a certain correlation between the number of party-specific votes of any two polling agencies, which defaults to 0.5. Per polling agency only the newest survey in the time windows is considered.

Setting the time window

The three arguments last_date, period and period_extended define the time window used in pool_surveys. Using these arguments one can choose between two types of pooling:

  1. If period_extended equals NA: Surveys in the time window from last_date to last_date - period will be considered for each polling agency.

  2. If period_extended does not equal NA: Same as 1. Additionally however, surveys in the time window from last_date - period to last_date - period_extended will also be considered for each polling agency, but only after downweighting them by halving their true sample size.

The latter option can be especially useful if opinion polls for a specific election are only published very rarely. As default, pool_surveys uses a time window starting from the current date and going 14 days back, not making use of period_extended.

Read data

# Scrape current surveys from the major polling agencies in Germany
# surveys <- get_surveys()
# As the web connection is sometimes a bit unstable we here use the sample data set of pre-scraped surveys
surveys <- coalitions::surveys_sample
surveys
## # A tibble: 7 x 2
##   pollster   surveys         
##   <chr>      <list>          
## 1 allensbach <tibble [3 × 5]>
## 2 emnid      <tibble [3 × 5]>
## 3 fgw        <tibble [3 × 5]>
## 4 forsa      <tibble [3 × 5]>
## 5 gms        <tibble [3 × 5]>
## 6 infratest  <tibble [3 × 5]>
## 7 insa       <tibble [3 × 5]>

Perform pooling

# Obtain the pooled sample for today, based on the last 14 days
last_date <- surveys %>% tidyr::unnest() %>% pull(date) %>% max()
pool <- pool_surveys(surveys, last_date = last_date)
pool %>% select(-start, -end)
## # A tibble: 7 x 6
##   pollster date       respondents party  percent votes
##   <chr>    <date>           <dbl> <chr>    <dbl> <dbl>
## 1 pooled   2017-09-02       3055. afd       8.89  272.
## 2 pooled   2017-09-02       3055. cdu      38.0  1161.
## 3 pooled   2017-09-02       3055. fdp       8.52  260.
## 4 pooled   2017-09-02       3055. greens    7.41  226.
## 5 pooled   2017-09-02       3055. left      9.06  277.
## 6 pooled   2017-09-02       3055. others    4.51  138.
## 7 pooled   2017-09-02       3055. spd      23.6   722.