Skip to contents

read_snapshot() reconstructs an observational layer from one or more snapshots previously created with scan_storage().

The function preserves filesystem observations as originally recorded, while appending snapshot-level provenance and contextual identifiers that support longitudinal and cross-storage analytical workflows.

The resulting table is intended to represent observed filesystem Instantiations rather than authoritative documentary entities.

Usage

read_snapshot(snapshot_files, include_repo_metadata = FALSE)

Arguments

snapshot_files

Character vector of snapshot .rds files.

include_repo_metadata

Logical.

If TRUE, repository metadata stored in snapshot attributes are materialized into the returned observational table.

The following repository-level variables may be added:

  • git_remote

  • git_branch

  • git_repo_id

This may increase memory usage because repository metadata are repeated across all observations belonging to the same repository.

Value

A data.frame containing combined filesystem observations.

The returned table contains all variables created by scan_storage() together with additional provenance and contextual identifiers:

  • snapshot_file: normalized path of the source snapshot artefact

  • snapshot_created_at: observation timestamp recorded in snapshot metadata

  • snapshot_schema_version: schema version recorded in snapshot metadata

  • storage_full_path: globally contextualized filesystem locator (storage_id::full_path)

  • storage_path_id: storage-scoped logical filesystem identifier (storage_id::rel_path)

  • observation_id: identifier of a specific filesystem observation event, combining storage context, logical path, and observation time

Details

Read one or more serialized observational filesystem snapshots and combine them into a unified observational table.

The function performs:

  • observational aggregation across snapshots

  • snapshot-level provenance preservation

  • contextual identifier enrichment

  • optional materialization of repository-level Git metadata

The function intentionally does not:

  • deduplicate observations

  • infer stable file identity

  • infer Record Resources or Record Sets

  • resolve documentary semantics

  • interpret provenance relationships

Multiple observations of the same filesystem approximation may occur:

  • across observation times

  • across storage contexts

  • across partially overlapping snapshots

  • across synchronized or copied working environments

In RiC-aligned operational terms:

  • each row represents one observed filesystem Instantiation

  • repeated observations may later support inference of more stable Record Resources

  • higher-level documentary interpretation is deferred to later analytical or curatorial stages

Snapshot-level provenance metadata are appended as columns to support:

  • provenance-aware analytics

  • reconstruction workflows

  • cross-storage comparison

  • longitudinal temporal analysis

Repository metadata are normally stored as snapshot attributes in order to avoid repeating identical repository information across all observations. When include_repo_metadata = TRUE, repository-level metadata are materialized into the returned table to support repository-aware analytical workflows.