| Title: | Harmonize Datasets |
|---|---|
| Description: | A common early step during data analysis is "data harmonization" -- converting disparate datasets into a unified, consistent format, with consistent column names, classes, and values. The goal of 'hrmn' is to make this process as easy as it can be. |
| Authors: | Jon Harmon [aut, cre] (ORCID: <https://orcid.org/0000-0003-4781-4346>) |
| Maintainer: | Jon Harmon <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9005 |
| Built: | 2026-05-29 17:06:36 UTC |
| Source: | https://github.com/wranglezone/hrmn |
Harmonize a data frame
harmonize_df( .data, ..., .spec = NULL, .unspecified_columns = c("error", "drop", "keep") )harmonize_df( .data, ..., .spec = NULL, .unspecified_columns = c("error", "drop", "keep") )
.data |
( |
... |
These dots are for future extensions and must be empty. |
.spec |
( |
.unspecified_columns |
( |
The input .data harmonized to a tibble::tibble().
Other harmonization functions:
harmonize_fct()
df <- data.frame( size = c("Small", "Medium", "S", "M", "Large", "Lrg", "Sm"), id = 1:7 ) # This spec will coerce values to NA if they are not "Small", "Medium", # or "Large". spec <- specify_df( size = specify_fct(levels = c("Small", "Medium", "Large")) ) # We can provide harmonization rules to the data before the spec is applied. # Here, we harmonize the input factor to convert "S", "M", "Sm", and "Lrg" to # valid values. harmonize_df( df, size = harmonize_fct( size, .lookup = c("S" = "Small", "M" = "Medium", "Sm" = "Small", "Lrg" = "Large") ), .spec = spec, .unspecified_columns = "keep" )df <- data.frame( size = c("Small", "Medium", "S", "M", "Large", "Lrg", "Sm"), id = 1:7 ) # This spec will coerce values to NA if they are not "Small", "Medium", # or "Large". spec <- specify_df( size = specify_fct(levels = c("Small", "Medium", "Large")) ) # We can provide harmonization rules to the data before the spec is applied. # Here, we harmonize the input factor to convert "S", "M", "Sm", and "Lrg" to # valid values. harmonize_df( df, size = harmonize_fct( size, .lookup = c("S" = "Small", "M" = "Medium", "Sm" = "Small", "Lrg" = "Large") ), .spec = spec, .unspecified_columns = "keep" )
Harmonize a factor
harmonize_fct(.data, ..., .spec = NULL, .lookup = NULL)harmonize_fct(.data, ..., .spec = NULL, .lookup = NULL)
.data |
( |
... |
These dots are for future extensions and must be empty. |
.spec |
( |
.lookup |
(named |
A harmonized factor().
Other harmonization functions:
harmonize_df()
# Without a spec, harmonize_fct() acts like [base::factor()]. harmonize_fct(c("a", "b", "c")) # Basic harmonization, dropping levels not in the spec spec <- specify_fct(levels = c("a", "b")) harmonize_fct(c("a", "b", "c"), .spec = spec) # Using a lookup table to recode values spec2 <- specify_fct(levels = c("fruit", "citrus")) lookup <- c(apple = "fruit", banana = "fruit", orange = "citrus") harmonize_fct( c("apple", "banana", "orange"), .spec = spec2, .lookup = lookup )# Without a spec, harmonize_fct() acts like [base::factor()]. harmonize_fct(c("a", "b", "c")) # Basic harmonization, dropping levels not in the spec spec <- specify_fct(levels = c("a", "b")) harmonize_fct(c("a", "b", "c"), .spec = spec) # Using a lookup table to recode values spec2 <- specify_fct(levels = c("fruit", "citrus")) lookup <- c(apple = "fruit", banana = "fruit", orange = "citrus") harmonize_fct( c("apple", "banana", "orange"), .spec = spec2, .lookup = lookup )
Create an object that specifies the desired format for a data frame. This specification object does not contain any data itself, only the rules for harmonization.
specify_df(...)specify_df(...)
... |
( |
A hrmn_spec_df object that acts as a specification.
Other specification functions:
specify_fct()
specify_df( response = specify_fct(levels = c("Yes", "No", "Maybe")), outcome = specify_fct(levels = c("Positive", "Negative")) )specify_df( response = specify_fct(levels = c("Yes", "No", "Maybe")), outcome = specify_fct(levels = c("Positive", "Negative")) )
Create an object that specifies the desired levels for a factor variable. This specification object does not contain any data itself, only the rules for harmonization.
specify_fct(levels = character())specify_fct(levels = character())
levels |
( |
A hrmn_spec_fct object that acts as a specification.
Other specification functions:
specify_df()
specify_fct(levels = c("a", "b", "c"))specify_fct(levels = c("a", "b", "c"))