Prepare and clean site index file — clean_site

A site index file contains information on when specific ARUs were deployed where. This function cleans a file (csv, xlsx) or data frame in preparation for adding these details to the output of clean_metadata(). It can be used to specify missing information according to date, such as GPS lon/lats and site ids.

Usage

clean_site_index(
  site_index,
  name_aru_id = "aru_id",
  name_site_id = "site_id",
  name_date_time = "date",
  name_coords = c("longitude", "latitude"),
  name_extra = NULL,
  resolve_overlaps = TRUE,
  quiet = FALSE
)

Arguments

site_index: (Spatial) Data frame or file path. Site index data to clean. If file path, must be to a local csv or xlsx file.
name_aru_id: Character. Name of the column that contains ARU ids. Default "aru_id".
name_site_id: Character. Name of the column that contains site ids. Default "site_id".
name_date_time: Character. Column name that contains dates or date/times. Can be vector of two names if there are both 'start' and 'end' columns. Can be NULL to ignore dates. Default "date".
name_coords: Character. Column names that contain longitude and latitude (in that order). Ignored if site_index is spatial. Default c("longitude", "latitude")
name_extra: Character. Column names for extra data to include. If a named vector, will rename the columns (see examples). Default NULL.
resolve_overlaps: Logical. Whether or not to resolve date overlaps by shifting the start/end dates to noon (default TRUE). This assumes that ARUs are generally not deployed/removed at midnight (the official start/end of a day) and so noon is used as an approximation for when an ARU was deployed or removed. If possible, use specific deployment times to avoid this issue.
quiet: Logical. Whether to suppress progress messages and other non-essential updates.

Value

Standardized site index data frame

Details

Note that times are assumed to be in 'local' time and a timezone isn't used (and is removed if present, replaced with UTC). This allows sites from different timezones to be processed at the same time.

Examples


s <- clean_site_index(example_sites,
  name_aru_id = "ARU",
  name_site_id = "Sites",
  name_date_time = c("Date_set_out", "Date_removed"),
  name_coords = c("lon", "lat")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`

s <- clean_site_index(example_sites,
  name_aru_id = "ARU",
  name_site_id = "Sites",
  name_date_time = c("Date_set_out", "Date_removed"),
  name_coords = c("lon", "lat"),
  name_extra = c("plot" = "Plots")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`

# Without dates
eg <- dplyr::select(example_sites, -Date_set_out, -Date_removed)
s <- clean_site_index(eg,
  name_aru_id = "ARU",
  name_site_id = "Sites",
  name_date_time = NULL,
  name_coords = c("lon", "lat"),
  name_extra = c("plot" = "Plots")
)