Skip to contents

Using regular expressions, metadata is extracted from file names and directory structure, checked and cleaned.


  project_dir = NULL,
  project_files = NULL,
  file_type = "wav",
  subset = NULL,
  subset_type = "keep",
  pattern_site_id = create_pattern_site_id(),
  pattern_aru_id = create_pattern_aru_id(),
  pattern_date = create_pattern_date(),
  pattern_time = create_pattern_time(),
  pattern_dt_sep = create_pattern_dt_sep(),
  pattern_tz_offset = create_pattern_tz_offset(),
  order_date = "ymd",
  quiet = FALSE



Character. Directory where project files are stored. File paths will be used to extract information and must actually exist.


Character. Vector of project file paths. These paths can be absolute or relative to the working directory, and don't actually need to point to existing files unless you plan to use clean_gps() or other sampling steps down the line. Must be provided if project_dir is NULL.


Character. Type of file (extension) to summarize. Default wav.


Character. Text pattern to mark a subset of files/directories to either "keep" or "omit" (see subset_type)


Character. Either keep (default) or omit files/directories which match the pattern in subset.


Character. Regular expression to extract site ids. See create_pattern_site_id(). Can be a vector of multiple patterns to match.


Character. Regular expression to extract ARU ids. See create_pattern_aru_id(). Can be a vector of multiple patterns to match.


Character. Regular expression to extract dates. See create_pattern_date(). Can be a vector of multiple patterns to match.


Character. Regular expression to extract times. See create_pattern_time(). Can be a vector of multiple patterns to match.


Character. Regular expression to mark separators between dates and times. See create_pattern_dt_sep().


Character. Regular expression to extract time zone offsets from file names. See. create_pattern_tz_offset().


Character. Order that the date appears in. "ymd" (default), "mdy", or "dmy". Can be a vector of multiple patterns to match.


Logical. Whether to suppress progress messages and other non-essential updates.


Data frame with extracted metadata


Note that times are extracted by first combining the date, date/time separator and the time patterns. This means that if there is a problem with this combination, dates might be extracted but date/times will not. This mismatch can be used to determine which part of a pattern needs to be tweaked.

See vignette("customizing", package = "ARUtools") for details on customizing clean_metadata() for your project.


clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 42 × 11
#>    file_name    type  path  aru_id manufacturer model aru_type site_id tz_offset
#>    <chr>        <chr> <chr> <chr>  <chr>        <chr> <chr>    <chr>   <chr>    
#>  1 P01_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P01_1   -0400    
#>  2 P01_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P01_1   -0400    
#>  3 P02_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#>  4 P02_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#>  5 P03_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P03_1   -0400    
#>  6 P04_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P04_1   -0400    
#>  7 P04_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P04_1   -0400    
#>  8 P05_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P05_1   -0400    
#>  9 P06_1_20200… wav   a_BA… BARLT… Frontier La… BAR-… BARLT    P06_1   -0400    
#> 10 P07_1_20200… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P07_1   NA       
#> # ℹ 32 more rows
#> # ℹ 2 more variables: date_time <dttm>, date <date>
clean_metadata(project_files = example_files, subset = "P02")
#> Extracting ARU info...
#> Extracting Dates and Times...
#> # A tibble: 6 × 11
#>   file_name     type  path  aru_id manufacturer model aru_type site_id tz_offset
#>   <chr>         <chr> <chr> <chr>  <chr>        <chr> <chr>    <chr>   <chr>    
#> 1 P02_1_202005… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 2 P02_1_202005… wav   a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 3 P02_1_202005… wav   j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 4 P02_1_202005… wav   j_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 5 P02_1_202005… wav   o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> 6 P02_1_202005… wav   o_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1   NA       
#> # ℹ 2 more variables: date_time <dttm>, date <date>