Skip to contents

dbGaP version 1.2.0

  • New function duplicated_id_check() checks for duplicated subject IDs in the data set (returns a warning, as this is allowed in longitudinal studies)
  • New function ascii_check() scans both the data dictionary and data set for (1) non-ASCII characters (e.g., é, ñ) and (2) newline () and carriage return ( characters
  • New helper ascii_cleaner() cleans a data frame by (1) converting smart quotes to straight quotes, replacing accented characters with ASCII equivalents, removing newline and carriage return characters
  • complete_check(): updated to include both duplicated_id_check() and ascii_check()
  • values_check(): updated Check 1 to require each VALUES cell contain exactly one equals sign (=) (e.g., 1=Yes vs. 1=Yes; 0=No), in alignment with dbGaP formatting requirements; added new Check 5 to detect duplicated MEANINGs in VALUES=MEANING entries.
  • misc_format_check(): now prevents errors when the VALUES column is the last column in the data dictionary (i.e., no columns follow) - this does return a WARNING, however, as this structure is valid but expected to be uncommon
  • integer_check(): resolved a rare error when the TYPE column contains malformed or unexpected values
  • Documentation: Updated to emphasize the importance of reading CSVs using readr::read_csv(…, na = c(““,”NA”)) or read.csv(…, na.strings = c(““,”NA”)) to correctly interpret missing cells, particularly in the VALUES column. (See GitHub Issue #16 for discussion)

dbGaP version 1.1.1

  • minmax_check: adjusted to return a sorted list of out of range values and polished documentation to be more informative
  • values_check: corrected bug in code that was not detecting leading/trailing zeros in VALUES columns
  • name_correct: when a user runs name_correct when it is not needed a new message informing them of no discrepancies detected will print

dbGaP version 1.1.0

CRAN release: 2023-09-27

  • added an informative error message when the required VALUES column is missing
  • adjusted values_check to temporarily create dummy names for blank-named columns beyond VALUES to prevent function from dying if column names after VALUES are blank strings
  • corrected minmax_check and integer_check bugs that occurred when SUBJECT_ID was a character vector
  • adjusted misc_format_check to return a WARNING that alerts users if they read in a data set and R automatically fills in column names after VALUES (which is allowed by the package, but not dbGaP itself)
  • adjusted NA_check to correctly capture NA=N/A VALUES
  • corrected bug in type_check that was allowing some non-allowable TYPE entries to pass
  • corrected but in missing_value_check that was flagging some variables even when they had properly encoded NA=N/A VALUES
  • made complete_check more robust to errors by wrapping functions in tryCatch
  • used seealso to link utility functions to relevant check functions

dbGaPCheckup version 1.0.2

CRAN release: 2023-02-22

  • removed row numbers from data set files
  • renamed data dictionary files by removing “SSM” acronym (done to avoid confusion as this means “subject sample mapping” and is intended for use with other dbGaP data files)
  • updated id_check() to include a check for missing SUBJECT_IDs (not allowed by dbGaP)
  • updated row_check() to check for duplicate and empty rows in the data dictionary (and not just the data set)
  • updated misc_format_check() to check that there are no missing VARNAME cells

dbGaPCheckup version 1.0.1

CRAN release: 2022-12-22

  • apply na_if() to one column at a time (vs. entire data frame at once) to maintain compatibility with next version of dplyr

dbGaPCheckup version 1.0.0

CRAN release: 2022-11-14

NEWS.md setup

  • added NEWS.md