This function checks for consistent usage of encoded values and missing value codes between the data dictionary and the data itself.
Arguments
- DD.dict
Data dictionary.
- DS.data
Data set.
- non.NA.missing.codes
A user-defined vector of numerical missing value codes (e.g., -9999).
Value
A list, returned invisibly,with two components:
"report"Tibble containing: (1) Name (Name of the function) and (2) Information (Details of all potential flagged variables).
"tb"Tibble with detailed information used to construct the Information.
Details
For each variable, we have three sets of possible values: the set D of all the unique values observed
in the data, the set V of all the values explicitly encoded in the VALUES columns of the data dictionary, and
the set M of the missing value codes defined by the user via the non.NA.missing.codes
argument.
This function examines various intersections of these three sets, providing awareness
checks to the user about possible issues of concern. While ideally all defined values in set V should
be observed in the data (e.g., in set D), it is not necessarily an error if one does not. This function
checks for:
(A) In Set M and Not in Set D: If the user defines a missing value code that is not present in the data.
(B) In Set V and Not in Set D: If a VALUES entry defines an encoded code value, but that code value is not present in the data.
(C) In Set M and Not in Set V: If the user defines a missing value code that is not defined in a VALUES entry.
(D) M in Set D and Not in Set V: If a defined global missing value code is present in the data for a given variable, but that variable does not have a corresponding VALUES entry.
(E) (Set V values that are not in Set M) that are NOT in Set D = (Set V not in M) not in D: If a VALUES entry is not defined as a missing value code AND is not detected in the data.
Examples
data(ExampleB)
value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
#> $Message
#> [1] "Flag: at least one check flagged."
#>
#> $Information
#> # A tibble: 7 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <named >
#> 1 Check A: In M, Not in D "All missing value codes… Flag <tibble>
#> 2 Check B: In V, Not in D "All value codes are in … Flag <tibble>
#> 3 Check C: In M, Not in V "All missing value codes… Flag <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag <tibble>
#> 5 Check E: V NOT in M, NOT in D "All value codes no defi… Passed <chr>
#> 6 Awareness: NsetD vs. NsetV "Size of Set D vs size o… Info <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info <tibble>
#>
print(value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999)))
#> $Message
#> [1] "Flag: at least one check flagged."
#>
#> $Information
#> # A tibble: 7 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <named >
#> 1 Check A: In M, Not in D "All missing value codes… Flag <tibble>
#> 2 Check B: In V, Not in D "All value codes are in … Flag <tibble>
#> 3 Check C: In M, Not in V "All missing value codes… Flag <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag <tibble>
#> 5 Check E: V NOT in M, NOT in D "All value codes no defi… Passed <chr>
#> 6 Awareness: NsetD vs. NsetV "Size of Set D vs size o… Info <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info <tibble>
#>
#> $report
#> # A tibble: 7 × 2
#> Function Information$check.name $check.description $check.status
#> <chr> <chr> <chr> <chr>
#> 1 value_missing_table Check A: In M, Not in D "All missing valu… Flag
#> 2 value_missing_table Check B: In V, Not in D "All value codes … Flag
#> 3 value_missing_table Check C: In M, Not in V "All missing valu… Flag
#> 4 value_missing_table Check D: In M & in D, no… "All missing valu… Flag
#> 5 value_missing_table Check E: V NOT in M, NOT… "All value codes … Passed
#> 6 value_missing_table Awareness: NsetD vs. Nse… "Size of Set D vs… Info
#> 7 value_missing_table Awareness: N_DnotM vs. N… "Size of Set D\\M… Info
#> # ℹ 1 more variable: Information$details <named list>
#>
#> $tb
#> # A tibble: 52 × 35
#> VARNAME TYPE VALUE MEANING VInD NumUniqDVs AllMInD AnyMInD MInD MNotInD
#> <chr> <chr> <chr> <chr> <lgl> <int> <lgl> <lgl> <lis> <list>
#> 1 SAMPLE_ID integ… -9999 missin… TRUE 85 TRUE TRUE <dbl> <chr>
#> 2 SEX integ… 0 male TRUE 2 FALSE FALSE <chr> <dbl>
#> 3 SEX integ… 1 female TRUE 2 FALSE FALSE <chr> <dbl>
#> 4 HEIGHT decim… -9999 missin… TRUE 96 TRUE TRUE <dbl> <chr>
#> 5 WEIGHT decim… -9999 missin… TRUE 77 TRUE TRUE <dbl> <chr>
#> 6 BMI decim… -9999 missin… TRUE 98 TRUE TRUE <dbl> <chr>
#> 7 OBESITY integ… 0 no TRUE 3 TRUE TRUE <dbl> <chr>
#> 8 OBESITY integ… 1 yes TRUE 3 TRUE TRUE <dbl> <chr>
#> 9 OBESITY integ… -9999 missin… TRUE 3 TRUE TRUE <dbl> <chr>
#> 10 ABD_CIRC decim… -9999 missin… TRUE 70 TRUE TRUE <dbl> <chr>
#> # ℹ 42 more rows
#> # ℹ 25 more variables: AllVsInD <lgl>, VsNotInD <list>, AllDefVsInMInD <lgl>,
#> # DefVsInMNotInD <list>, AllSetMInSetV <lgl>, SetMsNotInSetV <list>,
#> # All_MInSetD_InSetV <lgl>, setMInDNotInV <list>, All_VNotInM_NotInD <lgl>,
#> # setVNotInM_NotInD <chr>, NsetD <int>, NsetM <int>, NsetV <int>,
#> # NsetDAndSetV <int>, NsetMAndSetV <int>, NsetDAndSetM <int>, setV <list>,
#> # setD <list>, setM <list>, setDnotM <list>, setVnotM <list>, …
#>
results <- value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
#> $Message
#> [1] "Flag: at least one check flagged."
#>
#> $Information
#> # A tibble: 7 × 4
#> check.name check.description check.status details
#> <chr> <chr> <chr> <named >
#> 1 Check A: In M, Not in D "All missing value codes… Flag <tibble>
#> 2 Check B: In V, Not in D "All value codes are in … Flag <tibble>
#> 3 Check C: In M, Not in V "All missing value codes… Flag <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag <tibble>
#> 5 Check E: V NOT in M, NOT in D "All value codes no defi… Passed <chr>
#> 6 Awareness: NsetD vs. NsetV "Size of Set D vs size o… Info <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info <tibble>
#>
results$report$Information$details
#> $CheckA.AllMInD
#> # A tibble: 6 × 7
#> VARNAME AllMInD NsetD NsetM NsetDAndSetM MNotInD MInD
#> <chr> <lgl> <int> <int> <int> <list> <list>
#> 1 SEX FALSE 2 1 0 <dbl [1]> <chr [1]>
#> 2 LENGTH_SMOKING_YEARS FALSE 12 1 0 <dbl [1]> <chr [1]>
#> 3 HEART_RATE FALSE 44 1 0 <dbl [1]> <chr [1]>
#> 4 SOCIAL_SUPPORT FALSE 5 1 0 <dbl [1]> <chr [1]>
#> 5 PERCEIVED_CONFLICT FALSE 24 1 0 <dbl [1]> <chr [1]>
#> 6 PERCEIVED_HEALTH FALSE 10 1 0 <dbl [1]> <chr [1]>
#>
#> $CheckB.AllVsInD
#> # A tibble: 2 × 6
#> VARNAME AllVsInD NsetD NsetV NsetDAndSetV VsNotInD
#> <chr> <lgl> <int> <int> <int> <list>
#> 1 LENGTH_SMOKING_YEARS FALSE 12 2 1 <chr [1]>
#> 2 HEART_RATE FALSE 44 1 0 <chr [1]>
#>
#> $CheckC.AllSetMInSetV
#> # A tibble: 5 × 6
#> VARNAME AllSetMInSetV NsetV NsetM NsetMAndSetV SetMsNotInSetV
#> <chr> <lgl> <int> <int> <int> <list>
#> 1 SEX FALSE 2 1 0 <dbl [1]>
#> 2 CUFFSIZE FALSE 4 1 0 <dbl [1]>
#> 3 SOCIAL_SUPPORT FALSE 5 1 0 <dbl [1]>
#> 4 PERCEIVED_CONFLICT FALSE 2 1 0 <dbl [1]>
#> 5 PERCEIVED_HEALTH FALSE 2 1 0 <dbl [1]>
#>
#> $CheckD.All_MInSetD_InSetV
#> # A tibble: 1 × 3
#> VARNAME All_MInSetD_InSetV setMInDNotInV
#> <chr> <lgl> <list>
#> 1 CUFFSIZE FALSE <dbl [1]>
#>
#> $CheckE.All_VNotInM_NotInD
#> [1] "Passed"
#>
#> $countTable.DvsV
#> # A tibble: 18 × 5
#> VARNAME NsetD NsetV NsetDAndSetV Ndiff
#> <chr> <int> <int> <int> <int>
#> 1 CUFFSIZE 5 4 4 1
#> 2 PERCEIVED_HEALTH 10 2 2 8
#> 3 LENGTH_SMOKING_YEARS 12 2 1 10
#> 4 BP_DIASTOLIC 15 1 1 14
#> 5 PHYSICAL_ACTIVITY 22 1 1 21
#> 6 PERCEIVED_CONFLICT 24 2 2 22
#> 7 SUP_SKF 24 1 1 23
#> 8 REACT 25 1 1 24
#> 9 BP_SYSTOLIC 26 1 1 25
#> 10 ABD_SKF 29 1 1 28
#> 11 RESIST 36 1 1 35
#> 12 HEART_RATE 44 1 0 43
#> 13 HIP_CIRC 67 1 1 66
#> 14 ABD_CIRC 70 1 1 69
#> 15 WEIGHT 77 1 1 76
#> 16 SAMPLE_ID 85 1 1 84
#> 17 HEIGHT 96 1 1 95
#> 18 BMI 98 1 1 97
#>
#> $countTable.DnotMvsVnotM
#> # A tibble: 17 × 6
#> VARNAME DnotM_sub_VnotM DnotM_eq_VnotM N_DnotM N_VnotM Ndiff
#> <chr> <lgl> <lgl> <int> <int> <int>
#> 1 PERCEIVED_HEALTH FALSE FALSE 10 2 8
#> 2 LENGTH_SMOKING_YEARS FALSE FALSE 12 1 11
#> 3 BP_DIASTOLIC FALSE FALSE 14 0 14
#> 4 PHYSICAL_ACTIVITY FALSE FALSE 21 0 21
#> 5 SUP_SKF FALSE FALSE 23 0 23
#> 6 PERCEIVED_CONFLICT FALSE FALSE 24 2 22
#> 7 REACT FALSE FALSE 24 0 24
#> 8 BP_SYSTOLIC FALSE FALSE 25 0 25
#> 9 ABD_SKF FALSE FALSE 28 0 28
#> 10 RESIST FALSE FALSE 35 0 35
#> 11 HEART_RATE FALSE FALSE 44 0 44
#> 12 HIP_CIRC FALSE FALSE 66 0 66
#> 13 ABD_CIRC FALSE FALSE 69 0 69
#> 14 WEIGHT FALSE FALSE 76 0 76
#> 15 SAMPLE_ID FALSE FALSE 84 0 84
#> 16 HEIGHT FALSE FALSE 95 0 95
#> 17 BMI FALSE FALSE 97 0 97
#>