Strings

Rory Nolan

2018-06-29

First let’s load the library:

library(filesstrings)

The nth number in a string

I often want to get the first, last or nth number in a string.

pop <- "A population of 1000 comprised of 488 dogs and 512 cats."
first_number(pop)
#> [1] 1000
nth_number(pop, 2)
#> [1] 488
last_number(pop)
#> [1] 512

All the numbers in a string

extract_numbers(pop)
#> [[1]]
#> [1] 1000  488  512

All the non-numbers in a string

extract_non_numerics(pop)
#> [[1]]
#> [1] "A population of " " comprised of "   " dogs and "      
#> [4] " cats."

Split strings by numbers

str_split_by_nums(pop)
#> [[1]]
#> [1] "A population of " "1000"             " comprised of "  
#> [4] "488"              " dogs and "       "512"             
#> [7] " cats."

Could that be interpreted as numeric?

Sometimes we don’t want to know is something is numeric, we want to know if it could be considered to be numeric (or could be coerced to numeric). For this, there’s can_be_numeric().

is.numeric(23)
#> [1] TRUE
is.numeric("23")
#> [1] FALSE
can_be_numeric(23)
#> [1] TRUE
can_be_numeric("23")
#> [1] TRUE
can_be_numeric("23a")
#> [1] FALSE

Get the nth element of a string

str_elem("abc", 2)
#> [1] "b"
str_elem("abcdefz", -1)  # last element
#> [1] "z"

Trim anything (not just whitespace)

stringr’s str_trim just trims whitespace. What if you want to trim something else? Now you can trim_anything().

trim_anything("__rmarkdown_", "_")
#> [1] "rmarkdown"

Count the number of matches of a pattern in a string

count_matches(pop, " ")  # count the spaces in pop
#> [1] 10
count_matches("Bob and Joe went to see Bob's mother.", "Bob")
#> [1] 2

Turn duplicates of a pattern into singles

Suppose we want to remove double spacing:

double__spaced <- "Hello  world,  pretend  it's  Saturday  :-)"
count_matches(double__spaced, " ")  # count the spaces
#> [1] 10
single_spaced <- singleize(double__spaced, pattern = " ")
single_spaced
#> [1] "Hello world, pretend it's Saturday :-)"
count_matches(single_spaced, " ")  # half the spaces are gone
#> [1] 5

The bit of a string after the nth appearance of a pattern

Suppose we have sentences telling us about a couple of boxes:

box_infos <- c("Box 1 has weight 23kg and volume 0.3 cubic metres.",
               "Box 2 has weight 20kg and volume 0.33 cubic metres.")

We can get (for example) the weights of the boxes by taking the first number that appears after the word “weight”.

library(magrittr)
str_after_nth(box_infos, "weight", 1)  # the bit of the string after 1st "weight"
#> [1] " 23kg and volume 0.3 cubic metres." 
#> [2] " 20kg and volume 0.33 cubic metres."
str_after_nth(box_infos, "weight", 1) %>% nth_number(1)  # 1st number after 1st "weight"
#> [1] 23 20

We’d like to put all of the box information into a nice data frame. Here’s how.

tibble::tibble(box = nth_number(box_infos, 1),
        weight = str_after_nth(box_infos, "weight", 1) %>% 
          nth_number(1, decimals = TRUE),
        volume = str_after_nth(box_infos, "volume", 1) %>% 
          nth_number(1, decimals = TRUE)
)
#> # A tibble: 2 x 3
#>     box weight volume
#>   <int>  <dbl>  <dbl>
#> 1     1     23   0.3 
#> 2     2     20   0.33

Split camel case

Sometimes people use camel case (CamelCase) to avoid using spaces. What if we want to put the spaces back in?

camel_names <- c("JoeBloggs", "JaneyMac")
str_split_camel_case(camel_names)
#> [[1]]
#> [1] "Joe"    "Bloggs"
#> 
#> [[2]]
#> [1] "Janey" "Mac"