Skip to contents

Using regular expressions, metadata is extracted from file names and directory structure, checked and cleaned.

Usage

clean_metadata(
  project_dir = NULL,
  project_files = NULL,
  file_type = "wav",
  subset = NULL,
  subset_type = "keep",
  pattern_site_id = create_pattern_site_id(),
  pattern_aru_id = create_pattern_aru_id(),
  pattern_date = create_pattern_date(),
  pattern_time = create_pattern_time(),
  pattern_dt_sep = create_pattern_dt_sep(),
  pattern_tz_offset = create_pattern_tz_offset(),
  order_date = "ymd",
  quiet = FALSE
)

Arguments

project_dir

Character. Directory where project files are stored. File paths will be used to extract information and must actually exist.

project_files

Character. Vector of project file paths. These paths can be absolute or relative to the working directory, and don't actually need to point to existing files unless you plan to use clean_gps() or other sampling steps down the line. Must be provided if project_dir is NULL.

file_type

Character. Type of file (extension) to summarize. Default wav.

subset

Character. Text pattern to mark a subset of files/directories to either "keep" or "omit" (see subset_type)

subset_type

Character. Either keep (default) or omit files/directories which match the pattern in subset.

pattern_site_id

Character. Regular expression to extract site ids. See create_pattern_site_id(). Can be a vector of multiple patterns to match.

pattern_aru_id

Character. Regular expression to extract ARU ids. See create_pattern_aru_id(). Can be a vector of multiple patterns to match.

pattern_date

Character. Regular expression to extract dates. See create_pattern_date(). Can be a vector of multiple patterns to match.

pattern_time

Character. Regular expression to extract times. See create_pattern_time(). Can be a vector of multiple patterns to match.

pattern_dt_sep

Character. Regular expression to mark separators between dates and times. See create_pattern_dt_sep().

pattern_tz_offset

Character. Regular expression to extract time zone offsets from file names. See. create_pattern_tz_offset().

order_date

Character. Order that the date appears in. "ymd" (default), "mdy", or "dmy". Can be a vector of multiple patterns to match.

quiet

Logical. Whether to suppress progress messages and other non-essential updates.

Value

Data frame with extracted metadata

Details

Note that times are extracted by first combining the date, date/time separator and the time patterns. This means that if there is a problem with this combination, dates might be extracted but date/times will not. This mismatch can be used to determine which part of a pattern needs to be tweaked.

See vignette("customizing", package = "ARUtools") for details on customizing clean_metadata() for your project.

Examples

clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 42 × 11
#>    file_name    type  path  aru_id manufacturer model aru_type site_id tz_offset
#>    <chr>        <chr> <chr> <chr>  <chr>        <chr> <chr>    <chr>   <chr>    
#>  1 P01_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P01_1   -0400    
#>  2 P01_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P01_1   -0400    
#>  3 P02_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#>  4 P02_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#>  5 P03_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P03_1   -0400    
#>  6 P04_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P04_1   -0400    
#>  7 P04_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P04_1   -0400    
#>  8 P05_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P05_1   -0400    
#>  9 P06_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P06_1   -0400    
#> 10 P07_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P07_1   NA       
#> # ℹ 32 more rows
#> # ℹ 2 more variables: date_time <dttm>, date <date>
clean_metadata(project_files = example_files, subset = "P02")
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 6 × 11
#>   file_name     type  path  aru_id manufacturer model aru_type site_id tz_offset
#>   <chr>         <chr> <chr> <chr>  <chr>        <chr> <chr>    <chr>   <chr>    
#> 1 P02_1_202005… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 2 P02_1_202005… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 3 P02_1_202005… wav   j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 4 P02_1_202005… wav   j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 5 P02_1_202005… wav   o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 6 P02_1_202005… wav   o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> # ℹ 2 more variables: date_time <dttm>, date <date>