This function checks miscellaneous dbGaP formatting requirements to ensure (1) no empty variable names; (2) no duplicate variable names; (3) variable names do not contain "dbgap"; (4) there are no duplicate column names in the dictionary; and (5) column names falling after VALUES
column are unnamed.
Arguments
- DD.dict
Data dictionary.
- DS.data
Data set.
- verbose
When TRUE, the function prints the Message out, as well as more detailed information about which formatting checks failed.
Value
Tibble, returned invisibly, containing: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that fail one of these checks).
Details
Note that this check will return a WARNING for Check #5 depending on how the data set is read into R. Depending on the method used, R will automatically fill in column names after VALUES with "...col_number". This is allowed by the package, but it is NOT allowed by dbGaP, so please use caution if you write out a data set after making adjustments directly in R.
Examples
# Example 1: Fail check
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
#> $Message
#> [1] "ERROR: at least one check failed."
#>
#> $Information
#> # A tibble: 6 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_dbGaP
#> 4 Check 3 Check for use of `dbgap` in variable names Failed PHYSICAL_…
#> 5 Check 4 Duplicate dictionary column name check Passed NA
#> 6 Check 5 Column names after `VALUES` should be empty Warning ALERT: Yo…
#>
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))
#> # A tibble: 1 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named lis>
#> 1 2023-09-27 11:01:18 misc_format_check Failed ERROR: at least one … <tibble>
# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
#> $Message
#> [1] "Passed: no check-specific formatting issues identified."
#>
#> $Information
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Passed NA
#> 4 Check 4 Duplicate dictionary column name check Passed NA
#> 5 Check 5 Column names after `VALUES` should be empty Warning ALERT: Yo…
#>
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))
#> # A tibble: 1 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <named lis>
#> 1 2023-09-27 11:01:18 misc_format_check Passed Passed: no check-spe… <tibble>