Skip to contents

Generates a lightweight content signature based on hashing selected byte regions of a file. This provides a fast approximation for detecting identical or differing file instances without computing a full file hash.

Usage

quick_signature(path, n = 1024)

Arguments

path

Character. Path to the file.

n

Integer. Number of bytes to read from selected regions (default: 1024).

Value

Character. A signature string representing sampled file content.

Details

The function is designed for performance and is suitable for use in large-scale filesystem observations, where full hashing would be computationally expensive.

The signature is constructed from hashed byte segments:

  • small files: hash of full content

  • medium files: hash of first and last segments

  • large files: hash of first, middle, and last segments

The function provides a fast operational signal for probable content equivalence:

  • identical signatures strongly suggest identical content

  • different signatures indicate content differences

  • collisions are possible but unlikely in practice

Missing or inaccessible files return NA_character_.

In RiC-aligned operational terms, the signature supports later interpretation of observed filesystem Instantiations:

  • identifying likely identical Instantiations

  • distinguishing likely versions or derivations

  • detecting distributed or duplicated work

  • supporting later Record Set construction and reconciliation

The function does not establish authoritative identity or provenance. It provides observational evidence that may later support analytical or curatorial interpretation.

This function is typically used in conjunction with: