This function generates a user-readable report of the checks run by the complete_check function.
Arguments
- DD.dict
Data dictionary.
- DS.data
Data set.
- non.NA.missing.codes
A user-defined vector of numerical missing value codes (e.g., -9999).
- compact
When TRUE, the function prints a compact report, listing information from only the non-passed checks.
Value
Tibble, returned invisibly, containing the following information for each check: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).
Examples
# Example 1: Incorrectly showing as pass check on first attempt
data(ExampleB)
report <- check_report(DD.dict.B, DS.data.B)
#> # A tibble: 15 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE, MIN, …
#> 3 dimension_check Passed Passed: the variable count matches between the da…
#> 4 name_check Passed Passed: the variable names match between the data…
#> 5 id_check Passed Passed: All ID variable checks passed.
#> 6 row_check Passed Passed: no blank or duplicate rows detected in da…
#> 7 NA_check Passed Passed: no NA values detected in data set.
#> 8 type_check Passed Passed: All TYPE entries found are accepted by db…
#> 9 values_check Passed Passed: all four VALUES checks look good.
#> 10 integer_check Passed Passed: all variables listed as TYPE integer appe…
#> 11 decimal_check Passed Passed: all variables listed as TYPE decimal appe…
#> 12 misc_format_check Passed Passed: no check-specific formatting issues ident…
#> 13 description_check Passed Passed: unique description present for all variab…
#> 14 minmax_check Passed Passed: when provided, all variables are within t…
#> 15 missing_value_check Passed Passed: all missing value codes have a correspond…
#> [1] "All 15 checks passed."
# Addition of missing value codes calls attention to error
# at missing_value_check
report <- check_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 15 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE, MIN, …
#> 3 dimension_check Passed Passed: the variable count matches between the da…
#> 4 name_check Passed Passed: the variable names match between the data…
#> 5 id_check Passed Passed: All ID variable checks passed.
#> 6 row_check Passed Passed: no blank or duplicate rows detected in da…
#> 7 NA_check Passed Passed: no NA values detected in data set.
#> 8 type_check Passed Passed: All TYPE entries found are accepted by db…
#> 9 values_check Passed Passed: all four VALUES checks look good.
#> 10 integer_check Passed Passed: all variables listed as TYPE integer appe…
#> 11 decimal_check Passed Passed: all variables listed as TYPE decimal appe…
#> 12 misc_format_check Passed Passed: no check-specific formatting issues ident…
#> 13 description_check Passed Passed: unique description present for all variab…
#> 14 minmax_check Passed Passed: when provided, all variables are within t…
#> 15 missing_value_check Failed ERROR: some variables have non-encoded missing va…
#> --------------------
#> missing_value_check: Failed
#> ERROR: some variables have non-encoded missing value codes.
#> $missing_value_check.Info
#> VARNAME VALUE MEANING PASS
#> 14 CUFFSIZE -9999 <NA> FALSE
#>
#> --------------------
# Example 2: Several fail checks or not attempted
data(ExampleC)
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 15 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE…
#> 3 dimension_check Passed Passed: the variable count matches between…
#> 4 name_check Failed ERROR: the variable names DO NOT match bet…
#> 5 id_check Passed Passed: All ID variable checks passed.
#> 6 row_check Passed Passed: no blank or duplicate rows detecte…
#> 7 NA_check Not attempted ERROR: Required pre-check name_check faile…
#> 8 type_check Passed Passed: All TYPE entries found are accepte…
#> 9 values_check Passed Passed: all four VALUES checks look good.
#> 10 integer_check Not attempted ERROR: Required pre-check name_check faile…
#> 11 decimal_check Not attempted ERROR: Required pre-check name_check faile…
#> 12 misc_format_check Failed ERROR: at least one check failed.
#> 13 description_check Passed Passed: unique description present for all…
#> 14 minmax_check Not attempted ERROR: Required pre-check name_check faile…
#> 15 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> name_check: Failed
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match.
#> $name_check.Info
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#> --------------------
#> misc_format_check: Failed
#> ERROR: at least one check failed.
#> $misc_formatting_check.Info
#> # A tibble: 6 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_dbGaP
#> 4 Check 3 Check for use of `dbgap` in variable names Failed PHYSICAL_…
#> 5 Check 4 Duplicate dictionary column name check Passed NA
#> 6 Check 5 Column names after `VALUES` should be empty Warning ALERT: Yo…
#>
#> --------------------
# Note you can also run report using compact=FALSE
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999), compact = FALSE)
#> # A tibble: 15 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE…
#> 3 dimension_check Passed Passed: the variable count matches between…
#> 4 name_check Failed ERROR: the variable names DO NOT match bet…
#> 5 id_check Passed Passed: All ID variable checks passed.
#> 6 row_check Passed Passed: no blank or duplicate rows detecte…
#> 7 NA_check Not attempted ERROR: Required pre-check name_check faile…
#> 8 type_check Passed Passed: All TYPE entries found are accepte…
#> 9 values_check Passed Passed: all four VALUES checks look good.
#> 10 integer_check Not attempted ERROR: Required pre-check name_check faile…
#> 11 decimal_check Not attempted ERROR: Required pre-check name_check faile…
#> 12 misc_format_check Failed ERROR: at least one check failed.
#> 13 description_check Passed Passed: unique description present for all…
#> 14 minmax_check Not attempted ERROR: Required pre-check name_check faile…
#> 15 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> field_check: Passed
#> Passed: required fields VARNAME, VARDESC, UNITS, and VALUES present in the data dictionary.
#> $field_check.Info
#> VARNAME VARDESC UNITS VALUES
#> TRUE TRUE TRUE TRUE
#>
#> --------------------
#> pkg_field_check: Passed
#> Passed: package-level required fields TYPE, MIN, and MAX present in the data dictionary.
#> $pkg_field_check.Info
#> TYPE MIN MAX
#> TRUE TRUE TRUE
#>
#> --------------------
#> dimension_check: Passed
#> Passed: the variable count matches between the data dictionary and the data.
#> $dimension_check.Info
#> Variables in dictionary Variables in data
#> 29 29
#>
#> --------------------
#> name_check: Failed
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match.
#> $name_check.Info
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#> --------------------
#> id_check: Passed
#> Passed: All ID variable checks passed.
#> $id_check.Info
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Column 1 is labeled as 'SUBJECT_ID'. Passed The fi…
#> 2 Check 2 'SUBJECT_ID' is a column name in the data set. Passed 'SUBJE…
#> 3 Check 3 'SUBJECT_ID' is a column name in the data set. Passed No ill…
#> 4 Check 4 No leading zeros detected in 'SUBJECT_ID' col… Passed No lea…
#> 5 Check 5 No missing values for 'SUBJECT_ID'. Passed No mis…
#>
#> --------------------
#> row_check: Passed
#> Passed: no blank or duplicate rows detected in data set or data dictionary.
#> $row_check.Info
#> $row_check.Info$Empty_DataSet_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicate_DataSet_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicated_SubjectIDs
#> integer(0)
#>
#> $row_check.Info$Empty_DataDictionary_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicated_DataDictionary_RowNumbers
#> character(0)
#>
#>
#> --------------------
#> NA_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $NA_check.Info
#> $NA_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> type_check: Passed
#> Passed: All TYPE entries found are accepted by dbGaP per submission instructions.
#> $type_check.Info
#> [1] "integer" "integer, encoded value" "decimal, encoded value"
#>
#> --------------------
#> values_check: Passed
#> Passed: all four VALUES checks look good.
#> $values_check.Info
#> check.name check.description
#> 1 Check 1 Is an equals sign present for all values columns?
#> 2 Check 2 Are there any leading/trailing spaces near the first equals sign?
#> 3 Check 3 Do all variables of TYPE encoded have at least one VALUES entry?
#> 4 Check 4 Are all variables with VALUES entries of TYPE encoded?
#> check.status
#> 1 Passed
#> 2 Passed
#> 3 Passed
#> 4 Passed
#>
#> --------------------
#> integer_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $integer_check.Info
#> $integer_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> decimal_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $decimal_check.Info
#> $decimal_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> misc_format_check: Failed
#> ERROR: at least one check failed.
#> $misc_formatting_check.Info
#> # A tibble: 6 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_dbGaP
#> 4 Check 3 Check for use of `dbgap` in variable names Failed PHYSICAL_…
#> 5 Check 4 Duplicate dictionary column name check Passed NA
#> 6 Check 5 Column names after `VALUES` should be empty Warning ALERT: Yo…
#>
#> --------------------
#> description_check: Passed
#> Passed: unique description present for all variables in the data dictionary.
#> $description_check.Info
#> [1] "NA. All variables have a description."
#>
#> --------------------
#> minmax_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $minmax_check.Info
#> $minmax_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> missing_value_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $missing_value_check.Info
#> $missing_value_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------