Skip to contents

Derives lightweight operational reuse and reconstruction metrics from contextual resource observations.

The function summarizes how frequently contextual resources appear:

  • across observations;

  • across Record Sets;

  • across storage locations;

  • and across time.

The resulting metrics support:

  • duplication analysis;

  • reconstruction workflows;

  • synchronized workspace inspection;

  • cross-project reuse detection;

  • provenance-aware reporting;

  • and forensic review workflows.

In many filesystem workflows, the resulting metrics approximate how digital resources ("files") evolve, move, synchronize, and reappear across operational environments.

The function is designed to work together with:

as part of layered provenance-aware reconstruction workflows.

Usage

derive_reuse_metrics(
  x,
  resource_id = "resource_id",
  record_set_id = "record_set_id",
  storage_path_id = "storage_path_id",
  timestamp = "mtime",
  location = "full_path"
)

Arguments

x

A data.frame or tibble containing contextual resource observations.

resource_id

Character scalar identifying the column representing contextual resource identity.

Defaults to "resource_id".

record_set_id

Character scalar identifying the contextual Record Set membership column.

Defaults to "record_set_id".

storage_path_id

Character scalar identifying the storage-scoped path identifier column.

Defaults to "storage_path_id".

timestamp

Character scalar identifying the timestamp column used for temporal reconstruction.

Defaults to "mtime".

location

Character scalar identifying the human-readable location column.

Defaults to "full_path".

Value

A tibble containing operational reuse metrics.

Typical output variables include:

  • n_observations

  • n_record_sets

  • n_paths

  • first_seen

  • last_seen

  • locations

Details

The function intentionally derives lightweight operational metrics only.

It does not:

  • infer authoritative identity;

  • reconcile evolving resources;

  • perform provenance reasoning;

  • determine archival significance;

  • replace curatorial interpretation.

Metrics are derived from contextual operational observations and should be interpreted as analytical indicators rather than authoritative documentary assertions.

Examples

toy_resources <- tibble::tibble(
  resource_id = c(
    "res_001",
    "res_001",
    "res_002"
  ),
  record_set_id = c(
    "project_a",
    "project_b",
    "project_a"
  ),
  storage_path_id = c(
    "laptop::analysis.R",
    "backup::analysis.R",
    "laptop::report.qmd"
  ),
  mtime = as.POSIXct(c(
    "2025-01-01",
    "2025-01-03",
    "2025-01-02"
  )),
  full_path = c(
    "D:/project/analysis.R",
    "E:/backup/analysis.R",
    "D:/project/report.qmd"
  )
)

derive_reuse_metrics(
  toy_resources
)
#> # A tibble: 2 × 7
#>   resource_id n_observations n_record_sets n_paths first_seen         
#>   <chr>                <int>         <int>   <int> <dttm>             
#> 1 res_001                  2             2       2 2025-01-01 00:00:00
#> 2 res_002                  1             1       1 2025-01-02 00:00:00
#> # ℹ 2 more variables: last_seen <dttm>, locations <chr>