Skip to contents

This function checks miscellaneous dbGaP formatting requirements to ensure (1) no empty variable names; (2) no duplicate variable names; (3) variable names do not contain "dbgap"; (4) there are no duplicate column names in the dictionary; and (5) column names falling after VALUES column are unnamed.

Usage

misc_format_check(DD.dict, DS.data, verbose = TRUE)

Arguments

DD.dict

Data dictionary.

DS.data

Data set.

verbose

When TRUE, the function prints the Message out, as well as more detailed information about which formatting checks failed.

Value

Tibble, returned invisibly, containing: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that fail one of these checks).

Details

Note that this check will return a WARNING for Check #5 depending on how the data set is read into R. Depending on the method used, R will automatically fill in column names after VALUES with "...col_number". This is allowed by the package, but it is NOT allowed by dbGaP, so please use caution if you write out a data set after making adjustments directly in R.

Examples

# Example 1: Fail check 
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
#> $Message
#> [1] "ERROR: at least one check failed."
#> 
#> $Information
#> # A tibble: 5 × 4
#>   check.name check.description                              check.status details
#>   <chr>      <chr>                                          <chr>        <chr>  
#> 1 Check 1    Empty variable name check                      Passed       NA     
#> 2 Check 2    Duplicate variable name check                  Passed       NA     
#> 3 Check 3    Check for use of `dbgap` in variable names     Failed       HTN_db…
#> 4 Check 4    Duplicate dictionary column name check         Passed       NA     
#> 5 Check 5    Column names after `VALUES` should be blank o… Passed       NA     
#> 
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))
#> # A tibble: 1 × 5
#>   Time                Function          Status Message               Information
#>   <dttm>              <chr>             <chr>  <chr>                 <list>     
#> 1 2025-05-01 10:29:08 misc_format_check Failed ERROR: at least one … <tibble>   

# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
#> $Message
#> [1] "Passed: no check-specific formatting issues identified."
#> 
#> $Information
#> # A tibble: 5 × 4
#>   check.name check.description                              check.status details
#>   <chr>      <chr>                                          <chr>        <lgl>  
#> 1 Check 1    Empty variable name check                      Passed       NA     
#> 2 Check 2    Duplicate variable name check                  Passed       NA     
#> 3 Check 3    Check for use of `dbgap` in variable names     Passed       NA     
#> 4 Check 4    Duplicate dictionary column name check         Passed       NA     
#> 5 Check 5    Column names after `VALUES` should be blank o… Passed       NA     
#> 
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))
#> # A tibble: 1 × 5
#>   Time                Function          Status Message               Information
#>   <dttm>              <chr>             <chr>  <chr>                 <list>     
#> 1 2025-05-01 10:29:08 misc_format_check Passed Passed: no check-spe… <tibble>