Skip to contents

This function checks miscellaneous dbGaP formatting requirements to ensure (1) no empty variable names; (2) no duplicate variable names; (3) variable names do not contain "dbgap"; (4) there are no duplicate column names in the dictionary; and (5) column names falling after VALUES column are unnamed.

Usage

misc_format_check(DD.dict, DS.data, verbose = TRUE)

Arguments

DD.dict

Data dictionary.

DS.data

Data set.

verbose

When TRUE, the function prints the Message out, as well as more detailed information about which formatting checks failed.

Value

Tibble, returned invisibly, containing: (1) Time (time stamp); (2) Name (name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Names of variables that fail one of these checks).

Details

Note that this check will return a WARNING for Check #5 depending on how the data set is read into R. Depending on the method used, R will automatically fill in column names after VALUES with "...col_number". This is allowed by the package, but it is NOT allowed by dbGaP, so please use caution if you write out a data set after making adjustments directly in R.

Examples

# Example 1: Fail check 
data(ExampleJ)
misc_format_check(DD.dict.J, DS.data.J)
#> $Message
#> [1] "ERROR: at least one check failed."
#> 
#> $Information
#> # A tibble: 6 × 4
#>   check.name check.description                           check.status details   
#>   <chr>      <chr>                                       <chr>        <chr>     
#> 1 Check 1    Empty variable name check                   Passed       NA        
#> 2 Check 2    Duplicate variable name check               Passed       NA        
#> 3 Check 3    Check for use of `dbgap` in variable names  Failed       HTN_dbGaP 
#> 4 Check 3    Check for use of `dbgap` in variable names  Failed       PHYSICAL_…
#> 5 Check 4    Duplicate dictionary column name check      Passed       NA        
#> 6 Check 5    Column names after `VALUES` should be empty Warning      ALERT: Yo…
#> 
print(misc_format_check(DD.dict.J, DS.data.J, verbose=FALSE))
#> # A tibble: 1 × 5
#>   Time                Function          Status Message               Information
#>   <dttm>              <chr>             <chr>  <chr>                 <named lis>
#> 1 2023-09-27 11:01:18 misc_format_check Failed ERROR: at least one … <tibble>   

# Example 2: Pass check
data(ExampleA)
misc_format_check(DD.dict.A, DS.data.A)
#> $Message
#> [1] "Passed: no check-specific formatting issues identified."
#> 
#> $Information
#> # A tibble: 5 × 4
#>   check.name check.description                           check.status details   
#>   <chr>      <chr>                                       <chr>        <chr>     
#> 1 Check 1    Empty variable name check                   Passed       NA        
#> 2 Check 2    Duplicate variable name check               Passed       NA        
#> 3 Check 3    Check for use of `dbgap` in variable names  Passed       NA        
#> 4 Check 4    Duplicate dictionary column name check      Passed       NA        
#> 5 Check 5    Column names after `VALUES` should be empty Warning      ALERT: Yo…
#> 
print(misc_format_check(DD.dict.A, DS.data.A, verbose=FALSE))
#> # A tibble: 1 × 5
#>   Time                Function          Status Message               Information
#>   <dttm>              <chr>             <chr>  <chr>                 <named lis>
#> 1 2023-09-27 11:01:18 misc_format_check Passed Passed: no check-spe… <tibble>