This function runs a full workflow check including field_check
, pkg_field_check
, dimension_check
, name_check
, id_check
, row_check
, NA_check
, type_check
, values_check
, integer_check
, decimal_check
, misc_format_check
, description_check
, minmax_check
, and missing_value_check
.
Usage
complete_check(
DD_dict,
DS_data,
non.NA.missing.codes = NA,
reorder.dict = FALSE,
name.correct = FALSE
)
Arguments
- DD_dict
Data dictionary.
- DS_data
Data set.
- non.NA.missing.codes
A user-defined vector of encoded, numerical (i.e., non-NA) missing value codes (e.g., -9999).
- reorder.dict
When TRUE, and only if the names between the data and data dictionary match perfectly but are in the wrong order, the function will reorder the rows of the dictionary to match the columns of the data; note please use with caution: we recommend first running the function with the default set to FALSE to understand potential errors.
- name.correct
When TRUE, if name mismatches are identified, the function will rename the variable names in the data set to match the data dictionary; note please use with caution: we recommend first running the function with the default set to FALSE to identify order/dimension mismatches (vs. name mismatches).
Value
Tibble containing the following information for each check: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed/Warning); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).
Examples
# Example 1
# Note in this example, the missing value codes are not defined,
# so the last check ('missing_value_check') doesn't know to
# to check for encoded values
data(ExampleB)
complete_check(DD.dict.B, DS.data.B)
#> # A tibble: 15 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named list>
#> 1 2023-09-27 11:01:09 field_check Passed Passed: required… <lgl [4]>
#> 2 2023-09-27 11:01:09 pkg_field_check Passed Passed: package-… <lgl [3]>
#> 3 2023-09-27 11:01:09 dimension_check Passed Passed: the vari… <int [2]>
#> 4 2023-09-27 11:01:09 name_check Passed Passed: the vari… <chr [1]>
#> 5 2023-09-27 11:01:09 id_check Passed Passed: All ID v… <tibble>
#> 6 2023-09-27 11:01:09 row_check Passed Passed: no blank… <named list>
#> 7 2023-09-27 11:01:09 NA_check Passed Passed: no NA va… <chr [1]>
#> 8 2023-09-27 11:01:09 type_check Passed Passed: All TYPE… <chr [3]>
#> 9 2023-09-27 11:01:09 values_check Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check Passed Passed: all vari… <chr [0]>
#> 11 2023-09-27 11:01:09 decimal_check Passed Passed: all vari… <chr [1]>
#> 12 2023-09-27 11:01:09 misc_format_check Passed Passed: no check… <tibble>
#> 13 2023-09-27 11:01:09 description_check Passed Passed: unique d… <chr [1]>
#> 14 2023-09-27 11:01:09 minmax_check Passed Passed: when pro… <tibble>
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>
# Rerun check after defining missing value codes
complete_check(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999, -4444))
#> # A tibble: 15 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named list>
#> 1 2023-09-27 11:01:09 field_check Passed Passed: required… <lgl [4]>
#> 2 2023-09-27 11:01:09 pkg_field_check Passed Passed: package-… <lgl [3]>
#> 3 2023-09-27 11:01:09 dimension_check Passed Passed: the vari… <int [2]>
#> 4 2023-09-27 11:01:09 name_check Passed Passed: the vari… <chr [1]>
#> 5 2023-09-27 11:01:09 id_check Passed Passed: All ID v… <tibble>
#> 6 2023-09-27 11:01:09 row_check Passed Passed: no blank… <named list>
#> 7 2023-09-27 11:01:09 NA_check Passed Passed: no NA va… <chr [1]>
#> 8 2023-09-27 11:01:09 type_check Passed Passed: All TYPE… <chr [3]>
#> 9 2023-09-27 11:01:09 values_check Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check Passed Passed: all vari… <chr [0]>
#> 11 2023-09-27 11:01:09 decimal_check Passed Passed: all vari… <chr [1]>
#> 12 2023-09-27 11:01:09 misc_format_check Passed Passed: no check… <tibble>
#> 13 2023-09-27 11:01:09 description_check Passed Passed: unique d… <chr [1]>
#> 14 2023-09-27 11:01:09 minmax_check Passed Passed: when pro… <tibble>
#> 15 2023-09-27 11:01:09 missing_value_check Failed ERROR: some vari… <df [1 × 4]>
# Example 2
data(ExampleA)
complete_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))
#> # A tibble: 15 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named list>
#> 1 2023-09-27 11:01:09 field_check Passed Passed: required… <lgl [4]>
#> 2 2023-09-27 11:01:09 pkg_field_check Passed Passed: package-… <lgl [3]>
#> 3 2023-09-27 11:01:09 dimension_check Passed Passed: the vari… <int [2]>
#> 4 2023-09-27 11:01:09 name_check Passed Passed: the vari… <chr [1]>
#> 5 2023-09-27 11:01:09 id_check Passed Passed: All ID v… <tibble>
#> 6 2023-09-27 11:01:09 row_check Passed Passed: no blank… <named list>
#> 7 2023-09-27 11:01:09 NA_check Passed Passed: no NA va… <chr [1]>
#> 8 2023-09-27 11:01:09 type_check Passed Passed: All TYPE… <chr [3]>
#> 9 2023-09-27 11:01:09 values_check Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check Passed Passed: all vari… <chr [0]>
#> 11 2023-09-27 11:01:09 decimal_check Passed Passed: all vari… <chr [1]>
#> 12 2023-09-27 11:01:09 misc_format_check Passed Passed: no check… <tibble>
#> 13 2023-09-27 11:01:09 description_check Passed Passed: unique d… <chr [1]>
#> 14 2023-09-27 11:01:09 minmax_check Passed Passed: when pro… <tibble>
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>
# Example 3
data(ExampleD)
results <- complete_check(DD.dict.D, DS.data.D, non.NA.missing.codes=c(-9999, -4444))
# View output in greater detail
results$Message[2] # Recommend using add_missing_fields
#> [1] "ERROR: not all package-level required fields are present in the data dictionary. Consider using the add_missing_fields function to auto fill these fields."
results$Information$pkg_field_check.Info # We see that MIN, MAX, and TYPE are all missing
#> TYPE MIN MAX
#> FALSE FALSE FALSE
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
#> $Message
#> [1] "CORRECTED ERROR: not all package-level required fields were present in the data dictionary. The missing fields have now been added! TYPE was inferred from the data, and MIN/MAX have been added as empty fields."
#>
#> $Missing
#> [1] "TYPE" "MIN" "MAX"
#>
# Be sure to call in the new version of the dictionary (DD.dict.updated)
complete_check(DD.dict.updated, DS.data.D)
#> # A tibble: 15 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named list>
#> 1 2023-09-27 11:01:09 field_check Passed Passed: required… <lgl [4]>
#> 2 2023-09-27 11:01:09 pkg_field_check Passed Passed: package-… <lgl [3]>
#> 3 2023-09-27 11:01:09 dimension_check Passed Passed: the vari… <int [2]>
#> 4 2023-09-27 11:01:09 name_check Passed Passed: the vari… <chr [1]>
#> 5 2023-09-27 11:01:09 id_check Passed Passed: All ID v… <tibble>
#> 6 2023-09-27 11:01:09 row_check Passed Passed: no blank… <named list>
#> 7 2023-09-27 11:01:09 NA_check Passed Passed: no NA va… <chr [1]>
#> 8 2023-09-27 11:01:09 type_check Passed Passed: All TYPE… <chr [3]>
#> 9 2023-09-27 11:01:09 values_check Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check Passed Passed: all vari… <chr [0]>
#> 11 2023-09-27 11:01:09 decimal_check Passed Passed: all vari… <chr [1]>
#> 12 2023-09-27 11:01:09 misc_format_check Passed Passed: no check… <tibble>
#> 13 2023-09-27 11:01:09 description_check Failed ERROR: missing a… <tibble>
#> 14 2023-09-27 11:01:09 minmax_check Passed Passed: when pro… <tibble>
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>