
Construct a longitudinal observational universe
observe_universe.RdAggregates repeated filesystem observations into a lightweight longitudinal observational universe.
Usage
observe_universe(
snapshot_dir,
max_aggregation_depth = 2,
by_storage = TRUE,
by_person = FALSE,
exclude_patterns = c("\\.gitignore$", "\\.Rbuildignore$", "\\.github$",
"\\.quarto$", "\\.Rcheck$", "\\.RDataTmp", "\\.Trash-1000",
"\\.cryptomator$", "\\.editorconfig$", "\\.gitattributes$",
"\\.webmanifest$")
)Arguments
- snapshot_dir
Directory containing snapshot
.rdsfiles.- max_aggregation_depth
Integer giving the maximum filesystem path depth used to derive observational aggregation units.
- by_storage
Logical. If
TRUE, aggregation preservesstorage_id.- by_person
Logical. If
TRUE, aggregation preservesperson_id.- exclude_patterns
Character vector of regular expressions used to exclude operational artefacts from observational aggregation units.
Defaults exclude common:
hidden metadata folders;
temporary artefacts;
generated build artefacts;
repository management files.
Exclusions are applied after aggregation-unit derivation and before longitudinal summarisation.
Value
A tibble containing longitudinal observational summaries of filesystem aggregation units.
Variables include:
- observed_unit
Operational filesystem aggregation unit derived from path truncation.
- aggregation_depth
Actual observed filesystem depth of the aggregation unit.
- max_aggregation_depth
Maximum filesystem path depth used during aggregation.
- n_observations
Number of snapshot observations in which the aggregation unit appeared.
- avg_files_unit
Average number of files observed per snapshot for the aggregation unit.
- avg_size_unit
Average observed size in bytes per snapshot for the aggregation unit.
- avg_size_mb_unit
Average observed size in megabytes per snapshot for the aggregation unit.
- avg_size_gb_unit
Average observed size in gigabytes per snapshot for the aggregation unit.
- total_files_unit
Total files observed for the aggregation unit across all snapshots.
- total_size_unit
Total bytes observed for the aggregation unit across all snapshots.
Details
The function operates on snapshot .rds files created by
scan_storage() and summarises repeated observations of
operational filesystem aggregation units across time.
The resulting table is intentionally observational and pre-interpretive:
no intellectual Record Sets are inferred;
no semantic reconciliation is performed;
no provenance assertions beyond observation are made.
Instead, the function provides a lightweight observational universe suitable for:
reconstruction workflows;
audit preparation;
preservation planning;
storage coverage analysis;
identifying candidate contextual Record Sets;
longitudinal filesystem observation.
Observational aggregation units are operationally approximated from observed file paths using configurable path truncation rules.
Aggregation units derived at different aggregation depths are not directly comparable.
Single files are never treated as aggregation units.
Aggregation may optionally preserve:
storage boundaries (
storage_id);person boundaries (
person_id).
Examples
data("fscontextdemo_snapshot_01")
data("fscontextdemo_snapshot_02")
tmp <- tempfile()
dir.create(tmp)
saveRDS(
fscontextdemo_snapshot_01,
file.path(tmp, "snapshot_01.rds")
)
saveRDS(
fscontextdemo_snapshot_02,
file.path(tmp, "snapshot_02.rds")
)
observation_universe <- observe_universe(
snapshot_dir = tmp,
max_aggregation_depth = 2
)
head(observation_universe)
#> # A tibble: 1 × 11
#> storage_id aggregation_depth max_aggregation_depth observed_unit
#> <chr> <dbl> <dbl> <chr>
#> 1 fscontextdemo 2 2 D:/_packages/fscontextd…
#> # ℹ 7 more variables: n_observations <int>, avg_files_unit <dbl>,
#> # avg_size_unit <dbl>, avg_size_mb_unit <dbl>, avg_size_gb_unit <dbl>,
#> # total_files_unit <int>, total_size_unit <dbl>