Skip to contents

Helper functions to create regular expression patterns to match different metadata in file paths.

Usage

create_pattern_date(
  order = "ymd",
  sep = c("_", "-", ""),
  yr_digits = 4,
  look_ahead = "",
  look_behind = ""
)

create_pattern_time(
  sep = c("_", "-", ":", ""),
  seconds = "yes",
  look_ahead = "",
  look_behind = ""
)

create_pattern_dt_sep(
  sep = "T",
  optional = FALSE,
  look_ahead = "",
  look_behind = ""
)

create_pattern_aru_id(
  arus = c("BARLT", "S\\d(A|U)", "SM\\d", "SMM", "SMA"),
  n_digits = c(4, 8),
  sep = c("_", "-", ""),
  prefix = "",
  suffix = "",
  look_ahead = "",
  look_behind = ""
)

create_pattern_site_id(
  prefix = c("P", "Q"),
  p_digits = 2,
  sep = c("_", "-"),
  suffix = "",
  s_digits = 1,
  look_ahead = "",
  look_behind = ""
)

create_pattern_tz_offset(
  direction_from_UTC = "West",
  n_digits_hrs = 2,
  n_digits_min = 2
)

test_pattern(test, pattern)

Arguments

order

Character vector. Expected orders of (y)ear, (m)onth and (d)ate. Default is "ymd" for Year-Month-Date order. Can have more than one possible order.

sep

Character vector. Expected separator(s) between the pattern parts. Can be "" for no separator.

yr_digits

Numeric vector. Number of digits in Year, either 2 or 4.

look_ahead

Pattern to look ahead or after string Can be a regular expression or text.

look_behind

Pattern to look before behind string. Can be a regular expression or text.

seconds

Character. Whether seconds are included. Options are "yes", "no", "maybe".

optional

Logical. Whether the separator should be optional or not. Allows matching on different date/time patterns.

arus

Character vector. Pattern(s) identifying the ARU prefix (usually model specific).

n_digits

Numeric vector. Number of digits expected to follow the arus pattern. Can be one or two (a range).

prefix

Character vector. Prefix(es) for site ids.

suffix

Character vector. Suffix(es) for site ids.

p_digits

Numeric vector. Number(s) of digits following the prefix.

s_digits

Numeric vector. Number(s) of digits following the suffix.

direction_from_UTC

Character. Must be on of "West", "East" or "Both"

n_digits_hrs

Numeric vector. Number(s) of digits for hours in offset.

n_digits_min

Numeric vector. Number(s) of digits for minutes in offset.

test

Character vector. Examples of text to test.

pattern

Character. Regular expression pattern to test.

Value

Either a pattern (create_pattern_xxx()) or the text extracted by a pattern (test_pattern())

Details

By default create_pattern_aru_id() matches many common ARU patterns like BARLT0000, S4A0000, SM40000, SMM0000, SMA0000.

test_pattern() is a helper function to see what a regular expression pattern will pick out of some example text. Can be used to see if your pattern grabs what you want. This is just a simple wrapper around stringr::str_extract().

Functions

  • create_pattern_date(): Create a pattern to match a date

  • create_pattern_time(): Create a pattern to match a time

  • create_pattern_dt_sep(): Create a pattern to match a date/time separator

  • create_pattern_aru_id(): Create a pattern to match an ARU id

  • create_pattern_site_id(): Create a pattern to match a site id

  • create_pattern_tz_offset(): Create a pattern to match a site id

  • test_pattern(): Test patterns

Examples

create_pattern_date() # Default matches 2020-01-01 or 2020_01_01 or 20200101
#> [1] "((((([12]{1}\\d{3})))(_|-|)(\\d{2})(_|-|)(\\d{2})))"
# ("-", "_" or "" as separators)
create_pattern_date(sep = "") # Matches only 20200101 (no separator allowed)
#> [1] "((((([12]{1}\\d{3})))(\\d{2})(\\d{2})))"

create_pattern_time() # Default matches 23_59_59 (_, -, :, as optional separators)
#> [1] "([0-2]{1}[0-9]{1})(_|-|:|)([0-5]{1}[0-9]{1})((_|-|:|)([0-5]{1}[0-9]{1}))"
create_pattern_time(sep = "", seconds = "no") # Matches 2359 (no seconds no separators)
#> [1] "([0-2]{1}[0-9]{1})([0-5]{1}[0-9]{1})"

create_pattern_dt_sep() # Default matches 'T' as a required separator
#> [1] "(T)"
create_pattern_dt_sep(optional = TRUE) # 'T' as an optional separator
#> [1] "(T)?"
create_pattern_dt_sep(c("T", "_", "-")) # 'T', '_', or '-' as separators
#> [1] "(T|_|-)"

create_pattern_aru_id()
#> [1] "((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{4,8}"
create_pattern_aru_id(prefix = "CWS")
#> [1] "((CWS))((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{4,8}"
create_pattern_aru_id(n_digits = 12)
#> [1] "((BARLT)|(S\\d(A|U))|(SM\\d)|(SMM)|(SMA))(_|-|)\\d{12}"


create_pattern_site_id() # Default matches P00-0
#> [1] "((Q)|(P))((\\d{2}))(_|-)((\\d{1}))"
create_pattern_site_id(
  prefix = "site", p_digits = 3, sep = "",
  suffix = c("a", "b", "c"), s_digits = 0
) # Matches site000a
#> [1] "((site))((\\d{3}))((c)|(b)|(a))"


create_pattern_site_id() # Default matches P00-0
#> [1] "((Q)|(P))((\\d{2}))(_|-)((\\d{1}))"
create_pattern_site_id(
  prefix = "site", p_digits = 3, sep = "",
  suffix = c("a", "b", "c"), s_digits = 0
) # Matches site000a
#> [1] "((site))((\\d{3}))((c)|(b)|(a))"

pat <- create_pattern_aru_id(prefix = "CWS")
test_pattern("CWS_BARLT1012", pat) # No luck
#> [1] NA
pat <- create_pattern_aru_id(prefix = "CWS_")
test_pattern("CWS_BARLT1012", pat) # Ah ha!
#> [1] "CWS_BARLT1012"
pat <- create_pattern_site_id()

pat <- create_pattern_site_id()
test_pattern("P03", pat) # Nope
#> [1] NA
test_pattern("P03-1", pat) # Success!
#> [1] "P03-1"

pat <- create_pattern_site_id(prefix = "site", p_digits = 3, sep = "", s_digits = 0)
test_pattern("site111", pat)
#> [1] "site111"
pat <- create_pattern_site_id(
  prefix = "site", p_digits = 3, sep = "",
  suffix = c("a", "b", "c"), s_digits = 0
)
test_pattern(c("site9", "site100a"), pat)
#> [1] NA         "site100a"