
Compute a fast content signature for a file
quick_signature.RdGenerates a lightweight content signature based on hashing selected byte regions of a file. This provides a fast approximation for detecting identical or differing file instances without computing a full file hash.
Details
The function is designed for performance and is suitable for use in large-scale filesystem observations, where full hashing would be computationally expensive.
The signature is constructed from hashed byte segments:
small files: hash of full content
medium files: hash of first and last segments
large files: hash of first, middle, and last segments
The function provides a fast operational signal for probable content equivalence:
identical signatures strongly suggest identical content
different signatures indicate content differences
collisions are possible but unlikely in practice
Missing or inaccessible files return NA_character_.
In RiC-aligned operational terms, the signature supports later interpretation of observed filesystem Instantiations:
identifying likely identical Instantiations
distinguishing likely versions or derivations
detecting distributed or duplicated work
supporting later Record Set construction and reconciliation
The function does not establish authoritative identity or provenance. It provides observational evidence that may later support analytical or curatorial interpretation.
This function is typically used in conjunction with:
scan_storage()for generating observational snapshotssummarise_duplicates()for detecting duplicate and versioned files