Skip to contents

This function generates a user-readable report of the checks run by the complete_check function.

Usage

check_report(DD.dict, DS.data, non.NA.missing.codes = NA, compact = TRUE)

Arguments

DD.dict

Data dictionary.

DS.data

Data set.

non.NA.missing.codes

A user-defined vector of numerical missing value codes (e.g., -9999).

compact

When TRUE, the function prints a compact report, listing information from only the non-passed checks.

Value

Tibble, returned invisibly, containing the following information for each check: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).

See also

Examples

# Example 1: Incorrectly showing as pass check on first attempt
data(ExampleB)
report <- check_report(DD.dict.B, DS.data.B)
#> # A tibble: 15 × 3
#>    Function            Status Message                                           
#>    <chr>               <chr>  <chr>                                             
#>  1 field_check         Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#>  2 pkg_field_check     Passed Passed: package-level required fields TYPE, MIN, …
#>  3 dimension_check     Passed Passed: the variable count matches between the da…
#>  4 name_check          Passed Passed: the variable names match between the data…
#>  5 id_check            Passed Passed: All ID variable checks passed.            
#>  6 row_check           Passed Passed: no blank or duplicate rows detected in da…
#>  7 NA_check            Passed Passed: no NA values detected in data set.        
#>  8 type_check          Passed Passed: All TYPE entries found are accepted by db…
#>  9 values_check        Passed Passed: all four VALUES checks look good.         
#> 10 integer_check       Passed Passed: all variables listed as TYPE integer appe…
#> 11 decimal_check       Passed Passed: all variables listed as TYPE decimal appe…
#> 12 misc_format_check   Passed Passed: no check-specific formatting issues ident…
#> 13 description_check   Passed Passed: unique description present for all variab…
#> 14 minmax_check        Passed Passed: when provided, all variables are within t…
#> 15 missing_value_check Passed Passed: all missing value codes have a correspond…
#> [1] "All 15 checks passed."
# Addition of missing value codes calls attention to error
# at missing_value_check
report <- check_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 15 × 3
#>    Function            Status Message                                           
#>    <chr>               <chr>  <chr>                                             
#>  1 field_check         Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#>  2 pkg_field_check     Passed Passed: package-level required fields TYPE, MIN, …
#>  3 dimension_check     Passed Passed: the variable count matches between the da…
#>  4 name_check          Passed Passed: the variable names match between the data…
#>  5 id_check            Passed Passed: All ID variable checks passed.            
#>  6 row_check           Passed Passed: no blank or duplicate rows detected in da…
#>  7 NA_check            Passed Passed: no NA values detected in data set.        
#>  8 type_check          Passed Passed: All TYPE entries found are accepted by db…
#>  9 values_check        Passed Passed: all four VALUES checks look good.         
#> 10 integer_check       Passed Passed: all variables listed as TYPE integer appe…
#> 11 decimal_check       Passed Passed: all variables listed as TYPE decimal appe…
#> 12 misc_format_check   Passed Passed: no check-specific formatting issues ident…
#> 13 description_check   Passed Passed: unique description present for all variab…
#> 14 minmax_check        Passed Passed: when provided, all variables are within t…
#> 15 missing_value_check Failed ERROR: some variables have non-encoded missing va…
#> --------------------
#> missing_value_check: Failed 
#> ERROR: some variables have non-encoded missing value codes. 
#> $missing_value_check.Info
#>     VARNAME VALUE MEANING  PASS
#> 14 CUFFSIZE -9999    <NA> FALSE
#> 
#> --------------------

# Example 2: Several fail checks or not attempted
data(ExampleC)
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 15 × 3
#>    Function            Status        Message                                    
#>    <chr>               <chr>         <chr>                                      
#>  1 field_check         Passed        Passed: required fields VARNAME, VARDESC, …
#>  2 pkg_field_check     Passed        Passed: package-level required fields TYPE…
#>  3 dimension_check     Passed        Passed: the variable count matches between…
#>  4 name_check          Failed        ERROR: the variable names DO NOT match bet…
#>  5 id_check            Passed        Passed: All ID variable checks passed.     
#>  6 row_check           Passed        Passed: no blank or duplicate rows detecte…
#>  7 NA_check            Not attempted ERROR: Required pre-check name_check faile…
#>  8 type_check          Passed        Passed: All TYPE entries found are accepte…
#>  9 values_check        Passed        Passed: all four VALUES checks look good.  
#> 10 integer_check       Not attempted ERROR: Required pre-check name_check faile…
#> 11 decimal_check       Not attempted ERROR: Required pre-check name_check faile…
#> 12 misc_format_check   Failed        ERROR: at least one check failed.          
#> 13 description_check   Passed        Passed: unique description present for all…
#> 14 minmax_check        Not attempted ERROR: Required pre-check name_check faile…
#> 15 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> name_check: Failed 
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match. 
#> $name_check.Info
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> --------------------
#> misc_format_check: Failed 
#> ERROR: at least one check failed. 
#> $misc_formatting_check.Info
#> # A tibble: 6 × 4
#>   check.name check.description                           check.status details   
#>   <chr>      <chr>                                       <chr>        <chr>     
#> 1 Check 1    Empty variable name check                   Passed       NA        
#> 2 Check 2    Duplicate variable name check               Passed       NA        
#> 3 Check 3    Check for use of `dbgap` in variable names  Failed       HTN_dbGaP 
#> 4 Check 3    Check for use of `dbgap` in variable names  Failed       PHYSICAL_…
#> 5 Check 4    Duplicate dictionary column name check      Passed       NA        
#> 6 Check 5    Column names after `VALUES` should be empty Warning      ALERT: Yo…
#> 
#> --------------------
# Note you can also run report using compact=FALSE
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999), compact = FALSE)
#> # A tibble: 15 × 3
#>    Function            Status        Message                                    
#>    <chr>               <chr>         <chr>                                      
#>  1 field_check         Passed        Passed: required fields VARNAME, VARDESC, …
#>  2 pkg_field_check     Passed        Passed: package-level required fields TYPE…
#>  3 dimension_check     Passed        Passed: the variable count matches between…
#>  4 name_check          Failed        ERROR: the variable names DO NOT match bet…
#>  5 id_check            Passed        Passed: All ID variable checks passed.     
#>  6 row_check           Passed        Passed: no blank or duplicate rows detecte…
#>  7 NA_check            Not attempted ERROR: Required pre-check name_check faile…
#>  8 type_check          Passed        Passed: All TYPE entries found are accepte…
#>  9 values_check        Passed        Passed: all four VALUES checks look good.  
#> 10 integer_check       Not attempted ERROR: Required pre-check name_check faile…
#> 11 decimal_check       Not attempted ERROR: Required pre-check name_check faile…
#> 12 misc_format_check   Failed        ERROR: at least one check failed.          
#> 13 description_check   Passed        Passed: unique description present for all…
#> 14 minmax_check        Not attempted ERROR: Required pre-check name_check faile…
#> 15 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> field_check: Passed 
#> Passed: required fields VARNAME, VARDESC, UNITS, and VALUES present in the data dictionary. 
#> $field_check.Info
#> VARNAME VARDESC   UNITS  VALUES 
#>    TRUE    TRUE    TRUE    TRUE 
#> 
#> --------------------
#> pkg_field_check: Passed 
#> Passed: package-level required fields TYPE, MIN, and MAX present in the data dictionary. 
#> $pkg_field_check.Info
#> TYPE  MIN  MAX 
#> TRUE TRUE TRUE 
#> 
#> --------------------
#> dimension_check: Passed 
#> Passed: the variable count matches between the data dictionary and the data. 
#> $dimension_check.Info
#> Variables in dictionary       Variables in data 
#>                      29                      29 
#> 
#> --------------------
#> name_check: Failed 
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match. 
#> $name_check.Info
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> --------------------
#> id_check: Passed 
#> Passed: All ID variable checks passed. 
#> $id_check.Info
#> # A tibble: 5 × 4
#>   check.name check.description                              check.status details
#>   <chr>      <chr>                                          <chr>        <chr>  
#> 1 Check 1    Column 1 is labeled as 'SUBJECT_ID'.           Passed       The fi…
#> 2 Check 2    'SUBJECT_ID' is a column name in the data set. Passed       'SUBJE…
#> 3 Check 3    'SUBJECT_ID' is a column name in the data set. Passed       No ill…
#> 4 Check 4    No leading zeros detected in 'SUBJECT_ID' col… Passed       No lea…
#> 5 Check 5    No missing values for 'SUBJECT_ID'.            Passed       No mis…
#> 
#> --------------------
#> row_check: Passed 
#> Passed: no blank or duplicate rows detected in data set or data dictionary. 
#> $row_check.Info
#> $row_check.Info$Empty_DataSet_RowNumbers
#> character(0)
#> 
#> $row_check.Info$Duplicate_DataSet_RowNumbers
#> character(0)
#> 
#> $row_check.Info$Duplicated_SubjectIDs
#> integer(0)
#> 
#> $row_check.Info$Empty_DataDictionary_RowNumbers
#> character(0)
#> 
#> $row_check.Info$Duplicated_DataDictionary_RowNumbers
#> character(0)
#> 
#> 
#> --------------------
#> NA_check: Not attempted 
#> ERROR: Required pre-check name_check failed. 
#> $NA_check.Info
#> $NA_check.Info$Information
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> 
#> --------------------
#> type_check: Passed 
#> Passed: All TYPE entries found are accepted by dbGaP per submission instructions. 
#> $type_check.Info
#> [1] "integer"                "integer, encoded value" "decimal, encoded value"
#> 
#> --------------------
#> values_check: Passed 
#> Passed: all four VALUES checks look good. 
#> $values_check.Info
#>   check.name                                                 check.description
#> 1    Check 1                 Is an equals sign present for all values columns?
#> 2    Check 2 Are there any leading/trailing spaces near the first equals sign?
#> 3    Check 3  Do all variables of TYPE encoded have at least one VALUES entry?
#> 4    Check 4            Are all variables with VALUES entries of TYPE encoded?
#>   check.status
#> 1       Passed
#> 2       Passed
#> 3       Passed
#> 4       Passed
#> 
#> --------------------
#> integer_check: Not attempted 
#> ERROR: Required pre-check name_check failed. 
#> $integer_check.Info
#> $integer_check.Info$Information
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> 
#> --------------------
#> decimal_check: Not attempted 
#> ERROR: Required pre-check name_check failed. 
#> $decimal_check.Info
#> $decimal_check.Info$Information
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> 
#> --------------------
#> misc_format_check: Failed 
#> ERROR: at least one check failed. 
#> $misc_formatting_check.Info
#> # A tibble: 6 × 4
#>   check.name check.description                           check.status details   
#>   <chr>      <chr>                                       <chr>        <chr>     
#> 1 Check 1    Empty variable name check                   Passed       NA        
#> 2 Check 2    Duplicate variable name check               Passed       NA        
#> 3 Check 3    Check for use of `dbgap` in variable names  Failed       HTN_dbGaP 
#> 4 Check 3    Check for use of `dbgap` in variable names  Failed       PHYSICAL_…
#> 5 Check 4    Duplicate dictionary column name check      Passed       NA        
#> 6 Check 5    Column names after `VALUES` should be empty Warning      ALERT: Yo…
#> 
#> --------------------
#> description_check: Passed 
#> Passed: unique description present for all variables in the data dictionary. 
#> $description_check.Info
#> [1] "NA. All variables have a description."
#> 
#> --------------------
#> minmax_check: Not attempted 
#> ERROR: Required pre-check name_check failed. 
#> $minmax_check.Info
#> $minmax_check.Info$Information
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> 
#> --------------------
#> missing_value_check: Not attempted 
#> ERROR: Required pre-check name_check failed. 
#> $missing_value_check.Info
#> $missing_value_check.Info$Information
#> # A tibble: 2 × 2
#>   Data                    Dict                         
#>   <chr>                   <chr>                        
#> 1 Data: HTN               Dict: HTN_dbGaP              
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#> 
#> 
#> --------------------