Skip to contents

Recursively scans a root folder and returns a data.frame where each row represents one filesystem observation recorded at a specific time.

Usage

scan_storage(
  root,
  storage_id = "l480-1-ssd",
  person_id = "antaldaniel",
  scan_time = Sys.time(),
  compute_signature = TRUE,
  max_signature_size = 200 * 1024 * 1024
)

Arguments

root

Character. Path to the root folder to observe.

storage_id

Character. Identifier of the storage context.

person_id

Character. Identifier of the observer or operator.

scan_time

POSIXct. Timestamp of the observation. Defaults to Sys.time() if not provided.

compute_signature

Logical. Whether to compute lightweight content signatures.

max_signature_size

Numeric. Maximum file size (bytes) for signature computation.

Value

A data.frame where each row represents one filesystem observation.

Details

The function implements a read-only filesystem observation model:

  • it records accessible filesystem state;

  • it does not interpret file contents;

  • it does not assume canonical, complete, or authoritative state.

Each observation records:

  • a relative filesystem locator (rel_path);

  • a storage context (storage_id);

  • an observation timestamp (scan_time).

Additional metadata may include:

  • filesystem properties (size, timestamps, permissions);

  • optional content signatures (quick_sig);

  • repository and version-control context (repo_root, repo_rel_path, git_tracked).

The package deliberately records filesystem observations first and postpones documentary interpretation, Record Set construction, and RiC-aligned semantic assertions to later analytical stages.

This creates a reproducible observational snapshot suitable for:

  • forensic analysis of development environments;

  • reconstruction of activity patterns;

  • audit and compliance workflows;

  • alignment with version-controlled repositories.

The returned dataset is normalised to the canonical snapshot schema via normalise_snapshot_schema().

At minimum, the result contains:

  • rel_path: relative filesystem locator within the observed root;

  • storage_path_id: deterministic storage-scoped identifier derived from storage_id::rel_path;

  • filename: basename of the observed file;

  • mtime: last modification timestamp;

  • extension: file extension.

Additional variables may be present depending on scan configuration.

The function is:

  • read-only and non-destructive;

  • deterministic for a given filesystem state;

  • robust to inaccessible files, which are silently skipped.

The result represents observed filesystem state rather than complete historical provenance.