Skip to contents

Classifies observed digital resources into operational file types using workflow-oriented classification profiles.

Unlike simple MIME-type or extension lookups, the function is designed for provenance-aware analytical and reconstruction workflows where file meaning depends on operational context.

The function supports lightweight operational classification for:

  • filesystem reconstruction;

  • digital preservation review;

  • repository analytics;

  • synchronized workspace inspection;

  • web archive inventories;

  • and Heritage Digital Twin workflows.

The current implementation provides a small set of built-in profiles intended as operational starting points.

These profiles are intentionally lightweight and extensible.

Future versions may support:

  • user-defined profiles;

  • YAML-based vocabularies;

  • institutional review profiles;

  • preservation-oriented classification schemes;

  • workflow-specific semantic enrichment.

The function is designed to work together with:

as part of layered provenance-aware reconstruction workflows.

Usage

classify_operational_file_type(
  x,
  extension = "extension",
  profile = "r_development"
)

Arguments

x

A data.frame or tibble containing observed resources.

extension

Character scalar identifying the column containing file extensions.

Defaults to "extension".

profile

Character scalar defining the operational classification profile.

Current built-in profiles include:

  • "r_development"

The "r_development" profile is designed for:

  • R package development;

  • Quarto and R Markdown workflows;

  • reproducible research repositories;

  • analytical reporting pipelines.

Value

A character vector containing operational file type classifications.

Typical output categories include:

  • "code"

  • "markdown"

  • "workspace"

  • "data"

  • "artifact"

  • "document"

  • "website_generated"

  • "other"

Details

The function intentionally performs lightweight operational classification only.

It does not:

  • infer authoritative media types;

  • inspect file contents;

  • perform preservation risk assessment;

  • infer documentary semantics;

  • replace curatorial review.

Classification is based primarily on operational workflow heuristics derived from file extensions and workflow profiles.

Examples

toy_files <- tibble::tibble(
  extension = c(
    "R",
    "qmd",
    "csv",
    "png",
    "woff2"
  )
)

classify_operational_file_type(
  toy_files,
  profile = "r_development"
)
#> [1] "code"              "markdown"          "data"             
#> [4] "artifact"          "website_generated"