Skip to contents

This function runs a full workflow check including field_check, pkg_field_check, dimension_check, name_check, id_check, row_check, NA_check, type_check, values_check, integer_check, decimal_check, misc_format_check, description_check, minmax_check, and missing_value_check.

Usage

complete_check(
  DD_dict,
  DS_data,
  non.NA.missing.codes = NA,
  reorder.dict = FALSE,
  name.correct = FALSE
)

Arguments

DD_dict

Data dictionary.

DS_data

Data set.

non.NA.missing.codes

A user-defined vector of encoded, numerical (i.e., non-NA) missing value codes (e.g., -9999).

reorder.dict

When TRUE, and only if the names between the data and data dictionary match perfectly but are in the wrong order, the function will reorder the rows of the dictionary to match the columns of the data; note please use with caution: we recommend first running the function with the default set to FALSE to understand potential errors.

name.correct

When TRUE, if name mismatches are identified, the function will rename the variable names in the data set to match the data dictionary; note please use with caution: we recommend first running the function with the default set to FALSE to identify order/dimension mismatches (vs. name mismatches).

Value

Tibble containing the following information for each check: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed/Warning); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).

See also

Examples

# Example 1
# Note in this example, the missing value codes are not defined,
# so the last check ('missing_value_check') doesn't know to
# to check for encoded values
data(ExampleB)
complete_check(DD.dict.B, DS.data.B)
#> # A tibble: 15 × 5
#>    Time                Function            Status Message           Information 
#>    <dttm>              <chr>               <chr>  <chr>             <named list>
#>  1 2023-09-27 11:01:09 field_check         Passed Passed: required… <lgl [4]>   
#>  2 2023-09-27 11:01:09 pkg_field_check     Passed Passed: package-… <lgl [3]>   
#>  3 2023-09-27 11:01:09 dimension_check     Passed Passed: the vari… <int [2]>   
#>  4 2023-09-27 11:01:09 name_check          Passed Passed: the vari… <chr [1]>   
#>  5 2023-09-27 11:01:09 id_check            Passed Passed: All ID v… <tibble>    
#>  6 2023-09-27 11:01:09 row_check           Passed Passed: no blank… <named list>
#>  7 2023-09-27 11:01:09 NA_check            Passed Passed: no NA va… <chr [1]>   
#>  8 2023-09-27 11:01:09 type_check          Passed Passed: All TYPE… <chr [3]>   
#>  9 2023-09-27 11:01:09 values_check        Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check       Passed Passed: all vari… <chr [0]>   
#> 11 2023-09-27 11:01:09 decimal_check       Passed Passed: all vari… <chr [1]>   
#> 12 2023-09-27 11:01:09 misc_format_check   Passed Passed: no check… <tibble>    
#> 13 2023-09-27 11:01:09 description_check   Passed Passed: unique d… <chr [1]>   
#> 14 2023-09-27 11:01:09 minmax_check        Passed Passed: when pro… <tibble>    
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>
# Rerun check after defining missing value codes
complete_check(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-9999, -4444))
#> # A tibble: 15 × 5
#>    Time                Function            Status Message           Information 
#>    <dttm>              <chr>               <chr>  <chr>             <named list>
#>  1 2023-09-27 11:01:09 field_check         Passed Passed: required… <lgl [4]>   
#>  2 2023-09-27 11:01:09 pkg_field_check     Passed Passed: package-… <lgl [3]>   
#>  3 2023-09-27 11:01:09 dimension_check     Passed Passed: the vari… <int [2]>   
#>  4 2023-09-27 11:01:09 name_check          Passed Passed: the vari… <chr [1]>   
#>  5 2023-09-27 11:01:09 id_check            Passed Passed: All ID v… <tibble>    
#>  6 2023-09-27 11:01:09 row_check           Passed Passed: no blank… <named list>
#>  7 2023-09-27 11:01:09 NA_check            Passed Passed: no NA va… <chr [1]>   
#>  8 2023-09-27 11:01:09 type_check          Passed Passed: All TYPE… <chr [3]>   
#>  9 2023-09-27 11:01:09 values_check        Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check       Passed Passed: all vari… <chr [0]>   
#> 11 2023-09-27 11:01:09 decimal_check       Passed Passed: all vari… <chr [1]>   
#> 12 2023-09-27 11:01:09 misc_format_check   Passed Passed: no check… <tibble>    
#> 13 2023-09-27 11:01:09 description_check   Passed Passed: unique d… <chr [1]>   
#> 14 2023-09-27 11:01:09 minmax_check        Passed Passed: when pro… <tibble>    
#> 15 2023-09-27 11:01:09 missing_value_check Failed ERROR: some vari… <df [1 × 4]>

# Example 2
data(ExampleA)
complete_check(DD.dict.A, DS.data.A, non.NA.missing.codes=c(-9999, -4444))
#> # A tibble: 15 × 5
#>    Time                Function            Status Message           Information 
#>    <dttm>              <chr>               <chr>  <chr>             <named list>
#>  1 2023-09-27 11:01:09 field_check         Passed Passed: required… <lgl [4]>   
#>  2 2023-09-27 11:01:09 pkg_field_check     Passed Passed: package-… <lgl [3]>   
#>  3 2023-09-27 11:01:09 dimension_check     Passed Passed: the vari… <int [2]>   
#>  4 2023-09-27 11:01:09 name_check          Passed Passed: the vari… <chr [1]>   
#>  5 2023-09-27 11:01:09 id_check            Passed Passed: All ID v… <tibble>    
#>  6 2023-09-27 11:01:09 row_check           Passed Passed: no blank… <named list>
#>  7 2023-09-27 11:01:09 NA_check            Passed Passed: no NA va… <chr [1]>   
#>  8 2023-09-27 11:01:09 type_check          Passed Passed: All TYPE… <chr [3]>   
#>  9 2023-09-27 11:01:09 values_check        Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check       Passed Passed: all vari… <chr [0]>   
#> 11 2023-09-27 11:01:09 decimal_check       Passed Passed: all vari… <chr [1]>   
#> 12 2023-09-27 11:01:09 misc_format_check   Passed Passed: no check… <tibble>    
#> 13 2023-09-27 11:01:09 description_check   Passed Passed: unique d… <chr [1]>   
#> 14 2023-09-27 11:01:09 minmax_check        Passed Passed: when pro… <tibble>    
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>

# Example 3
data(ExampleD)
results <- complete_check(DD.dict.D, DS.data.D, non.NA.missing.codes=c(-9999, -4444))  
# View output in greater detail
results$Message[2] # Recommend using add_missing_fields
#> [1] "ERROR: not all package-level required fields are present in the data dictionary. Consider using the add_missing_fields function to auto fill these fields."
results$Information$pkg_field_check.Info # We see that MIN, MAX, and TYPE are all missing
#>  TYPE   MIN   MAX 
#> FALSE FALSE FALSE 
# Use the add_missing_fields function to add in data
DD.dict.updated <- add_missing_fields(DD.dict.D, DS.data.D)
#> $Message
#> [1] "CORRECTED ERROR: not all package-level required fields were present in the data dictionary. The missing fields have now been added! TYPE was inferred from the data, and MIN/MAX have been added as empty fields."
#> 
#> $Missing
#> [1] "TYPE" "MIN"  "MAX" 
#> 
# Be sure to call in the new version of the dictionary (DD.dict.updated)
complete_check(DD.dict.updated, DS.data.D)
#> # A tibble: 15 × 5
#>    Time                Function            Status Message           Information 
#>    <dttm>              <chr>               <chr>  <chr>             <named list>
#>  1 2023-09-27 11:01:09 field_check         Passed Passed: required… <lgl [4]>   
#>  2 2023-09-27 11:01:09 pkg_field_check     Passed Passed: package-… <lgl [3]>   
#>  3 2023-09-27 11:01:09 dimension_check     Passed Passed: the vari… <int [2]>   
#>  4 2023-09-27 11:01:09 name_check          Passed Passed: the vari… <chr [1]>   
#>  5 2023-09-27 11:01:09 id_check            Passed Passed: All ID v… <tibble>    
#>  6 2023-09-27 11:01:09 row_check           Passed Passed: no blank… <named list>
#>  7 2023-09-27 11:01:09 NA_check            Passed Passed: no NA va… <chr [1]>   
#>  8 2023-09-27 11:01:09 type_check          Passed Passed: All TYPE… <chr [3]>   
#>  9 2023-09-27 11:01:09 values_check        Passed Passed: all four… <df [4 × 3]>
#> 10 2023-09-27 11:01:09 integer_check       Passed Passed: all vari… <chr [0]>   
#> 11 2023-09-27 11:01:09 decimal_check       Passed Passed: all vari… <chr [1]>   
#> 12 2023-09-27 11:01:09 misc_format_check   Passed Passed: no check… <tibble>    
#> 13 2023-09-27 11:01:09 description_check   Failed ERROR: missing a… <tibble>    
#> 14 2023-09-27 11:01:09 minmax_check        Passed Passed: when pro… <tibble>    
#> 15 2023-09-27 11:01:09 missing_value_check Passed Passed: all miss… <df [0 × 4]>