Retrieving multiple resources using purrr

It may be useful to retrieve multiple or all of the resources for a package. For example, the TTC Subway Delay Data is released monthly, with a new file each month.

library(opendatatoronto)

ttc_subway_delays_resources <- list_package_resources("https://open.toronto.ca/dataset/ttc-subway-delay-data/")

ttc_subway_delays_resources
#> # A tibble: 33 x 4
#>    name                      id                        format last_modified
#>    <chr>                     <chr>                     <chr>  <date>       
#>  1 ttc-subway-delay-codes    fece136b-224a-412a-b191-… XLSX   2019-08-15   
#>  2 ttc-subway-delay-jan-201… 6664420f-316f-4f94-9ba4-… XLSX   2019-08-15   
#>  3 ttc-subway-delay-may-2017 d1159888-0035-45a0-b238-… XLSX   2019-08-15   
#>  4 ttc-subway-delay-june-20… 240d8e8c-d300-4f91-b94f-… XLSX   2019-08-15   
#>  5 ttc-subway-delay-july-20… 98d4ac77-aa9f-40a3-97ee-… XLSX   2019-08-15   
#>  6 ttc-subway-delay-august-… 92e7649a-cf2f-4ac7-9802-… XLSX   2019-08-15   
#>  7 ttc-subway-delay-septemb… 61412f10-656b-4992-9a1a-… XLSX   2019-08-15   
#>  8 ttc-subway-delay-october… 69a6db37-7982-49c7-8dbc-… XLSX   2019-08-15   
#>  9 ttc-subway-delay-novembe… 10080217-8022-41c0-a8ba-… XLSX   2019-08-15   
#> 10 ttc-subway-delay-decembe… a731c4bb-630a-4530-b590-… XLSX   2019-08-15   
#> # … with 23 more rows

It would be cumbersome to retrieve them one at a time, e.g. via

library(dplyr)

delays_jan_2019 <- ttc_subway_delays_resources %>%
  filter(name == "ttc-subway-delay-january-2019") %>%
  get_resource()

delays_feb_2019 <- ttc_subway_delays_resources %>%
  filter(name == "ttc-subway-delay-february-2019") %>%
  get_resource()

and so on.

Instead, you can iterate through the different resources using the purrr package and get them in a nested tibble:

library(purrr)

ttc_subway_delays_2019 <- ttc_subway_delays_resources %>%
  filter(grepl("2019", name)) %>%
  mutate(data = map(id, get_resource))

ttc_subway_delays_2019
#> # A tibble: 10 x 5
#>    name               id                  format last_modified data        
#>    <chr>              <chr>               <chr>  <date>        <list>      
#>  1 ttc-subway-delay-… 9a824dba-20cc-40b1… XLSX   2019-08-15    <tibble [1,…
#>  2 ttc-subway-delay-… e6bac74e-2da2-4429… XLSX   2019-08-15    <tibble [1,…
#>  3 ttc-subway-delay-… 0511879f-3233-4a42… XLSX   2019-08-15    <tibble [1,…
#>  4 ttc-subway-delay-… 447b4a5a-f696-4f05… XLSX   2019-08-15    <tibble [1,…
#>  5 ttc-subway-delay-… a302fcab-81a1-4142… XLSX   2019-08-15    <tibble [1,…
#>  6 ttc-subway-delay-… 655a138c-d381-4fe7… XLSX   2019-08-15    <tibble [1,…
#>  7 ttc-subway-delay-… 34d9619f-0239-4dad… XLSX   2019-10-07    <tibble [1,…
#>  8 ttc-subway-delay-… b6557580-a0f4-4c96… XLSX   2019-10-07    <tibble [1,…
#>  9 ttc-subway-delay-… e2a5e386-ddf7-4416… XLSX   2019-10-07    <tibble [1,…
#> 10 "ttc-subway-delay… fd837bd2-85ed-485e… XLSX   2019-11-08    <tibble [1,…

All of these resources contain equivalent data, with the same column names

map(ttc_subway_delays_2019[["data"]], colnames)
#> [[1]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[2]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[3]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[4]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[5]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[6]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[7]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[8]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[9]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"  
#> 
#> [[10]]
#>  [1] "Date"      "Time"      "Day"       "Station"   "Code"     
#>  [6] "Min Delay" "Min Gap"   "Bound"     "Line"      "Vehicle"

so the data can easily be unnested and combined:

bind_rows(ttc_subway_delays_2019[["data"]])
#> # A tibble: 16,185 x 10
#>    Date                Time  Day   Station Code  `Min Delay` `Min Gap`
#>    <dttm>              <chr> <chr> <chr>   <chr>       <dbl>     <dbl>
#>  1 2019-01-01 00:00:00 01:08 Tues… YORK M… PUSI            0         0
#>  2 2019-01-01 00:00:00 02:14 Tues… ST AND… PUMST           0         0
#>  3 2019-01-01 00:00:00 02:16 Tues… JANE S… TUSC            0         0
#>  4 2019-01-01 00:00:00 02:27 Tues… BLOOR … SUO             0         0
#>  5 2019-01-01 00:00:00 03:03 Tues… DUPONT… MUATC          11        16
#>  6 2019-01-01 00:00:00 03:08 Tues… EGLINT… EUATC          11        16
#>  7 2019-01-01 00:00:00 03:09 Tues… DUPONT… EUATC           6        11
#>  8 2019-01-01 00:00:00 03:26 Tues… ST CLA… EUATC           4         9
#>  9 2019-01-01 00:00:00 03:37 Tues… KENNED… TUMVS           0         0
#> 10 2019-01-01 00:00:00 08:04 Tues… DAVISV… MUNOA           5        10
#> # … with 16,175 more rows, and 3 more variables: Bound <chr>, Line <chr>,
#> #   Vehicle <dbl>