This function generates a user-readable report of the checks run by the complete_check function.
Value
Tibble, returned invisibly, containing the following information for each check: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (More detailed information about the potential errors identified).
Examples
# Example 1: Incorrectly showing as pass check on first attempt
data(ExampleB)
report <- check_report(DD.dict.B, DS.data.B)
#> # A tibble: 17 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE, MIN, …
#> 3 dimension_check Passed Passed: the variable count matches between the da…
#> 4 name_check Passed Passed: the variable names match between the data…
#> 5 id_check Passed Passed: all ID variable checks passed.
#> 6 duplicated_id_check Passed Passed: no duplicated SUBJECT_ID values found.
#> 7 row_check Passed Passed: no blank or duplicate rows detected in da…
#> 8 NA_check Passed Passed: no NA values detected in data set.
#> 9 type_check Passed Passed: all TYPE entries found are accepted by db…
#> 10 values_check Passed Passed: all four VALUES checks look good.
#> 11 integer_check Passed Passed: all variables listed as TYPE integer appe…
#> 12 decimal_check Passed Passed: all variables listed as TYPE decimal appe…
#> 13 misc_format_check Passed Passed: no check-specific formatting issues ident…
#> 14 description_check Passed Passed: unique description present for all variab…
#> 15 minmax_check Passed Passed: when provided, all variables are within t…
#> 16 ascii_check Passed Passed: no non-ASCII characters detected in data …
#> 17 missing_value_check Passed Passed: all missing value codes have a correspond…
#> [1] "All 17 checks passed."
# Addition of missing value codes calls attention to error
# at missing_value_check
report <- check_report(DD.dict.B, DS.data.B, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 17 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, UNITS, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE, MIN, …
#> 3 dimension_check Passed Passed: the variable count matches between the da…
#> 4 name_check Passed Passed: the variable names match between the data…
#> 5 id_check Passed Passed: all ID variable checks passed.
#> 6 duplicated_id_check Passed Passed: no duplicated SUBJECT_ID values found.
#> 7 row_check Passed Passed: no blank or duplicate rows detected in da…
#> 8 NA_check Passed Passed: no NA values detected in data set.
#> 9 type_check Passed Passed: all TYPE entries found are accepted by db…
#> 10 values_check Passed Passed: all four VALUES checks look good.
#> 11 integer_check Passed Passed: all variables listed as TYPE integer appe…
#> 12 decimal_check Passed Passed: all variables listed as TYPE decimal appe…
#> 13 misc_format_check Passed Passed: no check-specific formatting issues ident…
#> 14 description_check Passed Passed: unique description present for all variab…
#> 15 minmax_check Passed Passed: when provided, all variables are within t…
#> 16 ascii_check Passed Passed: no non-ASCII characters detected in data …
#> 17 missing_value_check Failed ERROR: some variables have non-encoded missing va…
#> --------------------
#> missing_value_check: Failed
#> ERROR: some variables have non-encoded missing value codes.
#> $Information
#> VARNAME VALUE MEANING PASS
#> 14 CUFFSIZE -9999 <NA> FALSE
#>
#> --------------------
# Example 2: Several fail checks or not attempted
data(ExampleC)
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999))
#> # A tibble: 17 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE…
#> 3 dimension_check Passed Passed: the variable count matches between…
#> 4 name_check Failed ERROR: the variable names DO NOT match bet…
#> 5 id_check Passed Passed: all ID variable checks passed.
#> 6 duplicated_id_check Passed Passed: no duplicated SUBJECT_ID values fo…
#> 7 row_check Passed Passed: no blank or duplicate rows detecte…
#> 8 NA_check Not attempted ERROR: Required pre-check name_check faile…
#> 9 type_check Passed Passed: all TYPE entries found are accepte…
#> 10 values_check Passed Passed: all four VALUES checks look good.
#> 11 integer_check Not attempted ERROR: Required pre-check name_check faile…
#> 12 decimal_check Not attempted ERROR: Required pre-check name_check faile…
#> 13 misc_format_check Failed ERROR: at least one check failed.
#> 14 description_check Passed Passed: unique description present for all…
#> 15 minmax_check Not attempted ERROR: Required pre-check name_check faile…
#> 16 ascii_check Passed Passed: no non-ASCII characters detected i…
#> 17 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> name_check: Failed
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match.
#> $name_check.Info
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#> --------------------
#> misc_format_check: Failed
#> ERROR: at least one check failed.
#> $misc_formatting_check.Info
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_db…
#> 4 Check 4 Duplicate dictionary column name check Passed NA
#> 5 Check 5 Column names after `VALUES` should be blank o… Passed NA
#>
#> --------------------
# Note you can also run report using compact=FALSE
report <- check_report(DD.dict.C, DS.data.C, non.NA.missing.codes=c(-4444, -9999), compact = FALSE)
#> # A tibble: 17 × 3
#> Function Status Message
#> <chr> <chr> <chr>
#> 1 field_check Passed Passed: required fields VARNAME, VARDESC, …
#> 2 pkg_field_check Passed Passed: package-level required fields TYPE…
#> 3 dimension_check Passed Passed: the variable count matches between…
#> 4 name_check Failed ERROR: the variable names DO NOT match bet…
#> 5 id_check Passed Passed: all ID variable checks passed.
#> 6 duplicated_id_check Passed Passed: no duplicated SUBJECT_ID values fo…
#> 7 row_check Passed Passed: no blank or duplicate rows detecte…
#> 8 NA_check Not attempted ERROR: Required pre-check name_check faile…
#> 9 type_check Passed Passed: all TYPE entries found are accepte…
#> 10 values_check Passed Passed: all four VALUES checks look good.
#> 11 integer_check Not attempted ERROR: Required pre-check name_check faile…
#> 12 decimal_check Not attempted ERROR: Required pre-check name_check faile…
#> 13 misc_format_check Failed ERROR: at least one check failed.
#> 14 description_check Passed Passed: unique description present for all…
#> 15 minmax_check Not attempted ERROR: Required pre-check name_check faile…
#> 16 ascii_check Passed Passed: no non-ASCII characters detected i…
#> 17 missing_value_check Not attempted ERROR: Required pre-check name_check faile…
#> --------------------
#> field_check: Passed
#> Passed: required fields VARNAME, VARDESC, UNITS, and VALUES present in the data dictionary.
#> $field_check.Info
#> VARNAME VARDESC UNITS VALUES
#> TRUE TRUE TRUE TRUE
#>
#> --------------------
#> pkg_field_check: Passed
#> Passed: package-level required fields TYPE, MIN, and MAX present in the data dictionary.
#> $pkg_field_check.Info
#> TYPE MIN MAX
#> TRUE TRUE TRUE
#>
#> --------------------
#> dimension_check: Passed
#> Passed: the variable count matches between the data dictionary and the data.
#> $dimension_check.Info
#> Variables in dictionary Variables in data
#> 29 29
#>
#> --------------------
#> name_check: Failed
#> ERROR: the variable names DO NOT match between the data dictionary and the data. If the intention behind the variable names is correct, consider using the name_correct function to automatically rename variables to match.
#> $name_check.Info
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#> --------------------
#> id_check: Passed
#> Passed: all ID variable checks passed.
#> $id_check.Info
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Column 1 is labeled as 'SUBJECT_ID'. Passed The fi…
#> 2 Check 2 'SUBJECT_ID' is a column name in the data set. Passed 'SUBJE…
#> 3 Check 3 'SUBJECT_ID' is a column name in the data set. Passed No ill…
#> 4 Check 4 No leading zeros detected in 'SUBJECT_ID' col… Passed No lea…
#> 5 Check 5 No missing values for 'SUBJECT_ID'. Passed No mis…
#>
#> --------------------
#> duplicated_id_check: Passed
#> Passed: no duplicated SUBJECT_ID values found.
#> $dup_id_check.Info
#> # A tibble: 0 × 1
#> # ℹ 1 variable: Duplicated_SUBJECT_IDs <chr>
#>
#> --------------------
#> row_check: Passed
#> Passed: no blank or duplicate rows detected in data set or data dictionary.
#> $row_check.Info
#> $row_check.Info$Empty_DataSet_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicate_DataSet_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicated_SubjectIDs
#> integer(0)
#>
#> $row_check.Info$Empty_DataDictionary_RowNumbers
#> character(0)
#>
#> $row_check.Info$Duplicated_DataDictionary_RowNumbers
#> character(0)
#>
#>
#> --------------------
#> NA_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $NA_check.Info
#> $NA_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> type_check: Passed
#> Passed: all TYPE entries found are accepted by dbGaP per submission instructions.
#> $type_check.Info
#> [1] "integer" "integer, encoded value" "decimal, encoded value"
#>
#> --------------------
#> values_check: Passed
#> Passed: all four VALUES checks look good.
#> $values_check.Info
#> check.name check.description
#> 1 Check 1 Does each VALUES cell contain exactly one '='?
#> 2 Check 2 Are there any leading/trailing spaces near the first equals sign?
#> 3 Check 3 Do all variables of TYPE encoded have at least one VALUES entry?
#> 4 Check 4 Are all variables with VALUES entries of TYPE encoded?
#> 5 Check 5 Do any encoded values share the same meaning within a variable?
#> check.status
#> 1 Passed
#> 2 Passed
#> 3 Passed
#> 4 Passed
#> 5 Passed
#>
#> --------------------
#> integer_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $integer_check.Info
#> $integer_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> decimal_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $decimal_check.Info
#> $decimal_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> misc_format_check: Failed
#> ERROR: at least one check failed.
#> $misc_formatting_check.Info
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_db…
#> 4 Check 4 Duplicate dictionary column name check Passed NA
#> 5 Check 5 Column names after `VALUES` should be blank o… Passed NA
#>
#> --------------------
#> description_check: Passed
#> Passed: unique description present for all variables in the data dictionary.
#> $description_check.Info
#> [1] "NA. All variables have a description."
#>
#> --------------------
#> minmax_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $minmax_check.Info
#> $minmax_check.Info$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------
#> ascii_check: Passed
#> Passed: no non-ASCII characters detected in data dictionary or data set.
#> $missing_value_check.Info
#> # A tibble: 0 × 0
#>
#> --------------------
#> missing_value_check: Not attempted
#> ERROR: Required pre-check name_check failed.
#> $Information
#> $Information$Information
#> # A tibble: 2 × 2
#> Data Dict
#> <chr> <chr>
#> 1 Data: HTN Dict: HTN_dbGaP
#> 2 Data: PHYSICAL_ACTIVITY Dict: PHYSICAL_ACTIVITY_dbGaP
#>
#>
#> --------------------