First let’s load the library:
library(filesstrings)
I often want to get the first, last or nth number in a string.
pop <- "A population of 1000 comprised of 488 dogs and 512 cats."
NthNumber(pop, 1)
#> [1] 1000
NthNumber(pop, -1) # last number
#> [1] 512
ExtractNumbers(pop)
#> [[1]]
#> [1] 1000 488 512
ExtractNonNumerics(pop)
#> [[1]]
#> [1] "A population of " " comprised of " " dogs and "
#> [4] " cats."
StrSplitByNums(pop)
#> [[1]]
#> [1] "A population of " "1000" " comprised of "
#> [4] "488" " dogs and " "512"
#> [7] " cats."
Sometimes we don’t want to know is something is numeric, we want to know if it could be considered to be numeric (or could be coerced to numeric). For this, there’s CanBeNumeric()
.
is.numeric(23)
#> [1] TRUE
is.numeric("23")
#> [1] FALSE
CanBeNumeric(23)
#> [1] TRUE
CanBeNumeric("23")
#> [1] TRUE
CanBeNumeric("23a")
#> [1] FALSE
StrElem("abc", 2)
#> [1] "b"
StrElem("abcdefz", -1) # last element
#> [1] "z"
stringr
’s str_trim
just trims whitespace. What if you want to trim something else? Now you can TrimAnything()
.
TrimAnything("__rmarkdown_", "_")
#> [1] "rmarkdown"
CountMatches(pop, " ") # count the spaces in pop
#> [1] 10
CountMatches("Bob and Joe went to see Bob's mother.", "Bob")
#> [1] 2
Suppose we want to remove double spacing:
double__spaced <- "Hello world, pretend it's Saturday :-)"
CountMatches(double__spaced, " ") # count the spaces
#> [1] 10
single_spaced <- DuplicatesToSingles(double__spaced, pattern = " ")
single_spaced
#> [1] "Hello world, pretend it's Saturday :-)"
CountMatches(single_spaced, " ") # half the spaces are gone
#> [1] 5
Suppose we have sentences telling us about a couple of boxes:
box.infos <- c("Box 1 has weight 23kg and volume 0.3 cubic metres.",
"Box 2 has weight 20kg and volume 0.33 cubic metres.")
We can get (for example) the weights of the boxes by taking the first number that appears after the word “weight”.
library(magrittr)
StrAfterNth(box.infos, "weight", 1) # the bit of the string after 1st "weight"
#> Box 1 has weight 23kg and volume 0.3 cubic metres.
#> " 23kg and volume 0.3 cubic metres."
#> Box 2 has weight 20kg and volume 0.33 cubic metres.
#> " 20kg and volume 0.33 cubic metres."
StrAfterNth(box.infos, "weight", 1) %>% NthNumber(1) # 1st number after 1st "weight"
#> [1] 23 20
We’d like to put all of the box information into a nice data frame. Here’s how.
tibble::tibble(box = NthNumber(box.infos, 1),
weight = StrAfterNth(box.infos, "weight", 1) %>%
NthNumber(1, decimals = TRUE),
volume = StrAfterNth(box.infos, "volume", 1) %>%
NthNumber(1, decimals = TRUE)
)
#> # A tibble: 2 x 3
#> box weight volume
#> <int> <dbl> <dbl>
#> 1 1 23 0.30
#> 2 2 20 0.33
Sometimes people use camel case (CamelCase) to avoid using spaces. What if we want to put the spaces back in?
camel.names <- c("JoeBloggs", "JaneyMac")
SplitCamelCase(camel.names)
#> [[1]]
#> [1] "Joe" "Bloggs"
#>
#> [[2]]
#> [1] "Janey" "Mac"
Now we have a list where each list element is a vector of a person’s names. To put them together with spaces in between, we can use PasteCollapseListElems
. This is an efficient way to perform paste(..., collapse = ...)
on each element of a list.
SplitCamelCase(camel.names) %>% PasteCollapseListElems(" ")
#> [1] "Joe Bloggs" "Janey Mac"