Skip to contents

Aggregates repeated filesystem observations into a lightweight longitudinal observational universe.

Usage

observe_universe(
  snapshot_dir,
  max_aggregation_depth = 2,
  by_storage = TRUE,
  by_person = FALSE,
  exclude_patterns = c("\\.gitignore$", "\\.Rbuildignore$", "\\.github$",
    "\\.quarto$", "\\.Rcheck$", "\\.RDataTmp", "\\.Trash-1000",
    "\\.cryptomator$", "\\.editorconfig$", "\\.gitattributes$",
    "\\.webmanifest$")
)

Arguments

snapshot_dir

Directory containing snapshot .rds files.

max_aggregation_depth

Integer giving the maximum filesystem path depth used to derive observational aggregation units.

by_storage

Logical. If TRUE, aggregation preserves storage_id.

by_person

Logical. If TRUE, aggregation preserves person_id.

exclude_patterns

Character vector of regular expressions used to exclude operational artefacts from observational aggregation units.

Defaults exclude common:

  • hidden metadata folders;

  • temporary artefacts;

  • generated build artefacts;

  • repository management files.

Exclusions are applied after aggregation-unit derivation and before longitudinal summarisation.

Value

A tibble containing longitudinal observational summaries of filesystem aggregation units.

Variables include:

observed_unit

Operational filesystem aggregation unit derived from path truncation.

aggregation_depth

Actual observed filesystem depth of the aggregation unit.

max_aggregation_depth

Maximum filesystem path depth used during aggregation.

n_observations

Number of snapshot observations in which the aggregation unit appeared.

avg_files_unit

Average number of files observed per snapshot for the aggregation unit.

avg_size_unit

Average observed size in bytes per snapshot for the aggregation unit.

avg_size_mb_unit

Average observed size in megabytes per snapshot for the aggregation unit.

avg_size_gb_unit

Average observed size in gigabytes per snapshot for the aggregation unit.

total_files_unit

Total files observed for the aggregation unit across all snapshots.

total_size_unit

Total bytes observed for the aggregation unit across all snapshots.

Details

The function operates on snapshot .rds files created by scan_storage() and summarises repeated observations of operational filesystem aggregation units across time.

The resulting table is intentionally observational and pre-interpretive:

  • no intellectual Record Sets are inferred;

  • no semantic reconciliation is performed;

  • no provenance assertions beyond observation are made.

Instead, the function provides a lightweight observational universe suitable for:

  • reconstruction workflows;

  • audit preparation;

  • preservation planning;

  • storage coverage analysis;

  • identifying candidate contextual Record Sets;

  • longitudinal filesystem observation.

Observational aggregation units are operationally approximated from observed file paths using configurable path truncation rules.

Aggregation units derived at different aggregation depths are not directly comparable.

Single files are never treated as aggregation units.

Aggregation may optionally preserve:

  • storage boundaries (storage_id);

  • person boundaries (person_id).

Examples

data("fscontextdemo_snapshot_01")
data("fscontextdemo_snapshot_02")

tmp <- tempfile()
dir.create(tmp)

saveRDS(
  fscontextdemo_snapshot_01,
  file.path(tmp, "snapshot_01.rds")
)

saveRDS(
  fscontextdemo_snapshot_02,
  file.path(tmp, "snapshot_02.rds")
)

observation_universe <- observe_universe(
  snapshot_dir = tmp,
  max_aggregation_depth = 2
)

head(observation_universe)
#> # A tibble: 1 × 11
#>   storage_id    aggregation_depth max_aggregation_depth observed_unit           
#>   <chr>                     <dbl>                 <dbl> <chr>                   
#> 1 fscontextdemo                 2                     2 D:/_packages/fscontextd…
#> # ℹ 7 more variables: n_observations <int>, avg_files_unit <dbl>,
#> #   avg_size_unit <dbl>, avg_size_mb_unit <dbl>, avg_size_gb_unit <dbl>,
#> #   total_files_unit <int>, total_size_unit <dbl>