Skip to contents

refine() incrementally stabilizes semantic assertions through deterministic contextual matching rules while preserving row cardinality and the original observational universe.

Usage

refine(x, target = NULL, rules, by, assertion, comment = NULL, match = "exact")

Arguments

x

A data frame or tibble.

target

Name of the target column to refine.

rules

A rule table or compiled rulebook.

by

Optional grouping variables used during refinement.

assertion

Optional assertion text recorded in provenance.

comment

Optional comment attached to the refinement step.

match

Matching strategy. Defaults to "first".

Details

The function is designed for lightweight semantic refinement workflows where semantic interpretations mature gradually through ordinary tidyverse operations.

Matching observations are identified through configurable matching semantics applied to one or more observational variables.

Supported matching semantics include:

  • "exact" relational equality;

  • "starts_with" hierarchical prefix matching;

  • "ends_with" suffix matching;

  • "contains" substring detection.

Matching positions in the target vector are replaced by refined semantic assertions.

Unmatched values remain unchanged.

refine() intentionally never:

  • removes rows;

  • reshapes tables;

  • modifies unrelated observations.

This makes refinement stages auditable, reversible, and compatible with iterative semantic stabilization workflows.

refine() operates on semantic operationalisations produced through workflows such as:

Rather than enforcing formally complete ontology semantics, the function provides a lightweight operational mechanism for progressively stabilizing semantic interpretations inside ordinary analytical workflows.

Multiple refinement stages may later mature into:

refine() operates on semantic operationalisations produced through workflows such as:

The function does not attempt to construct formally complete semantic graphs or enforce ontology-level consistency.

Instead, it provides a lightweight operational mechanism for progressively stabilizing semantic interpretations inside ordinary tidyverse workflows.

This approach is particularly useful when working with:

  • partially harmonised datasets;

  • inconsistent coding systems;

  • ambiguous metadata;

  • hierarchical filesystem structures;

  • exploratory semantic reconstruction workflows.

Multiple refinement stages may later mature into:

  • controlled vocabularies;

  • formally defined semantic vectors;

  • semantically enriched datasets;

  • or graph-based semantic representations.

Examples


files <- tibble::tibble(
  filename = c(
    "filmA.png",
    "filmB.png",
    "film.xlsx",
    "fill.png"
  ),
  extension = c(
    "png",
    "png",
    "xlsx",
    "png"
  )
)

out <- refine(
  x = files,
  target =
    rep(
      "unresolved",
      nrow(files)
    ),
  rules =
    tibble::tibble(
      filename = "film",
      extension = "png"
    ),
  by = c(
    "filename",
    "extension"
  ),
  match = c(
    "starts_with",
    "exact"
  ),
  assertion =
    "film_visualisation"
)

out
#> [1] "film_visualisation" "film_visualisation" "unresolved"        
#> [4] "unresolved"