Helper functions to create regular expression patterns to match different metadata in file paths.
Usage
create_pattern_date(
order = "ymd",
sep = c("_", "-", ""),
yr_digits = 4,
look_ahead = "",
look_behind = ""
)
create_pattern_time(
sep = c("_", "-", ":", ""),
seconds = "yes",
look_ahead = "",
look_behind = ""
)
create_pattern_dt_sep(
sep = "T",
optional = FALSE,
look_ahead = "",
look_behind = ""
)
create_pattern_aru_id(
arus = c("BARLT", "S\\d(A|U)", "SM\\d", "SMM", "SMA"),
n_digits = c(4, 8),
sep = c("_", "-", ""),
prefix = "",
suffix = "",
look_ahead = "",
look_behind = ""
)
create_pattern_site_id(
prefix = c("P", "Q"),
p_digits = 2,
sep = c("_", "-"),
suffix = "",
s_digits = 1,
look_ahead = "",
look_behind = ""
)
create_pattern_tz_offset(
direction_from_UTC = "West",
n_digits_hrs = 2,
n_digits_min = 2
)
test_pattern(test, pattern)
Arguments
- order
Character vector. Expected orders of (y)ear, (m)onth and (d)ate. Default is "ymd" for Year-Month-Date order. Can have more than one possible order.
- sep
Character vector. Expected separator(s) between the pattern parts. Can be "" for no separator.
- yr_digits
Numeric vector. Number of digits in Year, either 2 or 4.
- look_ahead
Pattern to look ahead or after string Can be a regular expression or text.
- look_behind
Pattern to look before behind string. Can be a regular expression or text.
- seconds
Character. Whether seconds are included. Options are "yes", "no", "maybe".
- optional
Logical. Whether the separator should be optional or not. Allows matching on different date/time patterns.
- arus
Character vector. Pattern(s) identifying the ARU prefix (usually model specific).
- n_digits
Numeric vector. Number of digits expected to follow the
arus
pattern. Can be one or two (a range).- prefix
Character vector. Prefix(es) for site ids.
- suffix
Character vector. Suffix(es) for site ids.
- p_digits
Numeric vector. Number(s) of digits following the
prefix
.- s_digits
Numeric vector. Number(s) of digits following the
suffix
.- direction_from_UTC
Character. Must be on of "West", "East" or "Both"
- n_digits_hrs
Numeric vector. Number(s) of digits for hours in offset.
- n_digits_min
Numeric vector. Number(s) of digits for minutes in offset.
- test
Character vector. Examples of text to test.
- pattern
Character. Regular expression pattern to test.
Details
By default create_pattern_aru_id()
matches many common ARU patterns like
BARLT0000
, S4A0000
, SM40000
, SMM0000
, SMA0000
.
test_pattern()
is a helper function to see what a regular expression
pattern will pick out of some example text. Can be used to see if your
pattern grabs what you want. This is just a simple wrapper around
stringr::str_extract()
.
Functions
create_pattern_date()
: Create a pattern to match a datecreate_pattern_time()
: Create a pattern to match a timecreate_pattern_dt_sep()
: Create a pattern to match a date/time separatorcreate_pattern_aru_id()
: Create a pattern to match an ARU idcreate_pattern_site_id()
: Create a pattern to match a site idcreate_pattern_tz_offset()
: Create a pattern to match a site idtest_pattern()
: Test patterns
Examples
create_pattern_date() # Default matches 2020-01-01 or 2020_01_01 or 20200101
#> [1] "((((([12]{1}\\d{3})))(_|-|)(\\d{2})(_|-|)(\\d{2})))"
# ("-", "_" or "" as separators)
create_pattern_date(sep = "") # Matches only 20200101 (no separator allowed)
#> [1] "((((([12]{1}\\d{3})))(\\d{2})(\\d{2})))"
create_pattern_time() # Default matches 23_59_59 (_, -, :, as optional separators)
#> [1] "([0-2]{1}[0-9]{1})(_|-|:|)([0-5]{1}[0-9]{1})((_|-|:|)([0-5]{1}[0-9]{1}))"
create_pattern_time(sep = "", seconds = "no") # Matches 2359 (no seconds no separators)
#> [1] "([0-2]{1}[0-9]{1})([0-5]{1}[0-9]{1})"
create_pattern_dt_sep() # Default matches 'T' as a required separator
#> [1] "(T)"
create_pattern_dt_sep(optional = TRUE) # 'T' as an optional separator
#> [1] "(T)?"
create_pattern_dt_sep(c("T", "_", "-")) # 'T', '_', or '-' as separators
#> [1] "(T|_|-)"
create_pattern_aru_id()
#> [1] "((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{4,8}"
create_pattern_aru_id(prefix = "CWS")
#> [1] "((CWS))((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{4,8}"
create_pattern_aru_id(n_digits = 12)
#> [1] "((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{12}"
create_pattern_site_id() # Default matches P00-0
#> [1] "((Q)|(P))((\\d{2}))(_|-)((\\d{1}))"
create_pattern_site_id(
prefix = "site", p_digits = 3, sep = "",
suffix = c("a", "b", "c"), s_digits = 0
) # Matches site000a
#> [1] "((site))((\\d{3}))((c)|(b)|(a))"
create_pattern_site_id() # Default matches P00-0
#> [1] "((Q)|(P))((\\d{2}))(_|-)((\\d{1}))"
create_pattern_site_id(
prefix = "site", p_digits = 3, sep = "",
suffix = c("a", "b", "c"), s_digits = 0
) # Matches site000a
#> [1] "((site))((\\d{3}))((c)|(b)|(a))"
pat <- create_pattern_aru_id(prefix = "CWS")
test_pattern("CWS_BARLT1012", pat) # No luck
#> [1] NA
pat <- create_pattern_aru_id(prefix = "CWS_")
test_pattern("CWS_BARLT1012", pat) # Ah ha!
#> [1] "CWS_BARLT1012"
pat <- create_pattern_site_id()
pat <- create_pattern_site_id()
test_pattern("P03", pat) # Nope
#> [1] NA
test_pattern("P03-1", pat) # Success!
#> [1] "P03-1"
pat <- create_pattern_site_id(prefix = "site", p_digits = 3, sep = "", s_digits = 0)
test_pattern("site111", pat)
#> [1] "site111"
pat <- create_pattern_site_id(
prefix = "site", p_digits = 3, sep = "",
suffix = c("a", "b", "c"), s_digits = 0
)
test_pattern(c("site9", "site100a"), pat)
#> [1] NA "site100a"