Using regular expressions, metadata is extracted from file names and directory structure, checked and cleaned.
Usage
clean_metadata(
project_dir = NULL,
project_files = NULL,
file_type = "wav",
subset = NULL,
subset_type = "keep",
pattern_site_id = create_pattern_site_id(),
pattern_aru_id = create_pattern_aru_id(),
pattern_date = create_pattern_date(),
pattern_time = create_pattern_time(),
pattern_dt_sep = create_pattern_dt_sep(),
pattern_tz_offset = create_pattern_tz_offset(),
order_date = "ymd",
quiet = FALSE
)
Arguments
- project_dir
Character. Directory where project files are stored. File paths will be used to extract information and must actually exist.
- project_files
Character. Vector of project file paths. These paths can be absolute or relative to the working directory, and don't actually need to point to existing files unless you plan to use
clean_gps()
or other sampling steps down the line. Must be provided ifproject_dir
isNULL
.- file_type
Character. Type of file (extension) to summarize. Default wav.
- subset
Character. Text pattern to mark a subset of files/directories to either
"keep"
or"omit"
(seesubset_type
)- subset_type
Character. Either
keep
(default) oromit
files/directories which match the pattern insubset
.- pattern_site_id
Character. Regular expression to extract site ids. See
create_pattern_site_id()
. Can be a vector of multiple patterns to match.- pattern_aru_id
Character. Regular expression to extract ARU ids. See
create_pattern_aru_id()
. Can be a vector of multiple patterns to match.- pattern_date
Character. Regular expression to extract dates. See
create_pattern_date()
. Can be a vector of multiple patterns to match.- pattern_time
Character. Regular expression to extract times. See
create_pattern_time()
. Can be a vector of multiple patterns to match.- pattern_dt_sep
Character. Regular expression to mark separators between dates and times. See
create_pattern_dt_sep()
.- pattern_tz_offset
Character. Regular expression to extract time zone offsets from file names. See.
create_pattern_tz_offset()
.- order_date
Character. Order that the date appears in. "ymd" (default), "mdy", or "dmy". Can be a vector of multiple patterns to match.
- quiet
Logical. Whether to suppress progress messages and other non-essential updates.
Details
Note that times are extracted by first combining the date, date/time separator and the time patterns. This means that if there is a problem with this combination, dates might be extracted but date/times will not. This mismatch can be used to determine which part of a pattern needs to be tweaked.
See vignette("customizing", package = "ARUtools")
for details on
customizing clean_metadata()
for your project.
Examples
clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 42 × 11
#> file_name type path aru_id manufacturer model aru_type site_id tz_offset
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 P01_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 2 P01_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 3 P02_1_20200… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 4 P02_1_20200… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 5 P03_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P03_1 -0400
#> 6 P04_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P04_1 -0400
#> 7 P04_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P04_1 -0400
#> 8 P05_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P05_1 -0400
#> 9 P06_1_20200… wav a_BA… BARLT… Frontier La… BAR-… BARLT P06_1 -0400
#> 10 P07_1_20200… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P07_1 NA
#> # ℹ 32 more rows
#> # ℹ 2 more variables: date_time <dttm>, date <date>
clean_metadata(project_files = example_files, subset = "P02")
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 6 × 11
#> file_name type path aru_id manufacturer model aru_type site_id tz_offset
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 2 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 3 P02_1_202005… wav j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 4 P02_1_202005… wav j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 5 P02_1_202005… wav o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> 6 P02_1_202005… wav o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 NA
#> # ℹ 2 more variables: date_time <dttm>, date <date>