
Summarise file activity by time period and structural path
summarise_activity.RdAggregates file-level observations (e.g. from scan_storage()) into
time-based summaries grouped by a deterministic structural path prefix.
Arguments
- df
A
data.framerepresenting a filesystem snapshot. Must conform to the canonical schema (seenormalise_snapshot_schema()), including:rel_pathfilenamemtime(POSIXct)extensionoptionally
git_tracked
- extensions
Character vector of file extensions to include (case-insensitive, without leading dots).
- path_col
Character. Name of the column containing file paths (default:
"rel_path").- time_unit
One of
"week","month","day","year".- max_files
Integer. Maximum number of file names shown per group.
Value
A data.frame with one row per (period × group_path), containing:
- period
Time bucket identifier (e.g.
"2026-17").- group_path
Project-level grouping derived from the first components of
rel_path, typically representing project and module (e.g._packages/iocodelists/R).- start
Earliest modification date in the group.
- end
Latest modification date in the group.
- file_names
Pipe-separated list of filenames (truncated).
- n_files
Number of file observations in the group.
- n_unique_files
Number of distinct files (
rel_path) in the group.- untracked
Number of files not tracked by Git (if available).
Details
The function derives:
a time bucket (
period) from file modification times (mtime)a grouping key (
group_path) derived from the project and its immediate subdirectory (module), using an internal structural parser
and summarises activity within each (period × group_path) combination.
This provides a reproducible, structure-aware view of observed activity, suitable for exploratory analysis, forensic reconstruction, and audit workflows.
This function operates on observational data:
grouping is structural and deterministic, based on the first components of
rel_path, typically corresponding to project and module folders (e.g.R,tests,data-raw)no assumptions are made about project structure or file roles
identical inputs always produce identical outputs
The group_path is a project–module level projection of rel_path.
It is derived by extracting the first components of the path (e.g.
_packages/iocodelists/R) and is intended for aggregation and reporting.
The output is intended for analysis and reporting, not for
file-level identity or joins. For identity, use rel_path.
Modification times (mtime) are treated as a proxy for activity.
They indicate observed changes, not a complete editing history.
Files under .Trash are excluded by default.
This approach aligns grouping with typical project layouts (e.g. R packages),
where the first directory levels correspond to project boundaries and
functional modules.
Examples
if (FALSE) { # \dontrun{
df <- scan_storage("D:/_eviota")
# Weekly overview
summarise_activity(df, time_unit = "week")
# Monthly overview
summarise_activity(df, time_unit = "month")
} # }