Skip to contents

This function scans both the data dictionary and data set for problematic characters that may interfere with dbGaP submission requirements. Specifically, it checks for: (1) Non-ASCII characters (e.g., with accents) and (2) Newline and carriage return characters (e.g., line breaks). It returns a list of any variable names (columns), row numbers, and values where these issues are detected.

Usage

ascii_check(DD.dict, DS.data, verbose = TRUE)

Arguments

DD.dict

Data dictionary.

DS.data

Data set.

verbose

When TRUE, the function prints the Message out, as well as detailed information on non-ASCII locations.

Value

Tibble, returned invisibly, containing: (1) Time (Time stamp); (2) Name (Name of the function); (3) Status (Passed/Failed); (4) Message (A copy of the message the function printed out); (5) Information (Column, row, and value of detected non-ASCII characters).

Examples

# Passed example
data(ExampleA)
ascii_check(DD.dict.A, DS.data.A) 
#> $Message
#> [1] "Passed: no non-ASCII characters detected in data dictionary or data set."
#> 

# Failed example
data(ExampleT)
ascii_check(DD.dict.T, DS.data.T)
#> $Message
#> [1] "ERROR: non-ASCII characters detected. See Information for details."
#> 
#> $Information
#>              file column row  value          issue_type
#> 1 Data dictionary VALUES   5 0=café Non-ASCII character
#>