This function checks miscellaneous dbGaP formatting requirements to ensure (1) no empty variable names; (2) no duplicate variable names; (3) variable names do not contain "dbgap"; (4) there are no duplicate column names in the dictionary; and (5) column names falling after VALUES
column are unnamed.
Value
Tibble, returned invisibly, containing: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that fail one of these checks).
Details
Note that this check will return a WARNING for Check #5 depending on how the data set is read into R. Depending on the method used, R will automatically fill in column names after VALUES with "...col_number". This is allowed by the package, but it is NOT allowed by dbGaP, so please use caution if you write out a data set after making adjustments directly in R.
Examples
# Example 1: Fail check
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
#> $Message
#> [1] "ERROR: at least one check failed."
#>
#> $Information
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <chr>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Failed HTN_db…
#> 4 Check 4 Duplicate dictionary column name check Passed NA
#> 5 Check 5 Column names after `VALUES` should be blank o… Passed NA
#>
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))
#> # A tibble: 1 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <list>
#> 1 2025-05-01 10:29:08 misc_format_check Failed ERROR: at least one … <tibble>
# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
#> $Message
#> [1] "Passed: no check-specific formatting issues identified."
#>
#> $Information
#> # A tibble: 5 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <lgl>
#> 1 Check 1 Empty variable name check Passed NA
#> 2 Check 2 Duplicate variable name check Passed NA
#> 3 Check 3 Check for use of `dbgap` in variable names Passed NA
#> 4 Check 4 Duplicate dictionary column name check Passed NA
#> 5 Check 5 Column names after `VALUES` should be blank o… Passed NA
#>
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))
#> # A tibble: 1 × 5
#> Time Function Status Message Information
#> <dttm> <chr> <chr> <chr> <list>
#> 1 2025-05-01 10:29:08 misc_format_check Passed Passed: no check-spe… <tibble>