Skip to contents

This function checks for consistent usage of encoded values and missing value codes between the data dictionary and the data itself.

Usage

value_missing_table(DD.dict, DS.data, non.NA.missing.codes = NA)

Arguments

DD.dict

Data dictionary.

DS.data

Data set.

non.NA.missing.codes

A user-defined vector of numerical missing value codes (e.g., -9999).

Value

A list, returned invisibly,with two components:

  • "report"Tibble containing: (1) Name (Name of the function) and (2) Information (Details of all potential flagged variables).

  • "tb"Tibble with detailed information used to construct the Information.

Details

For each variable, we have three sets of possible values: the set D of all the unique values observed in the data, the set V of all the values explicitly encoded in the VALUES columns of the data dictionary, and the set M of the missing value codes defined by the user via the non.NA.missing.codes argument. This function examines various intersections of these three sets, providing awareness checks to the user about possible issues of concern. While ideally all defined values in set V should be observed in the data (e.g., in set D), it is not necessarily an error if one does not. This function checks for:

(A) In Set M and Not in Set D: If the user defines a missing value code that is not present in the data.

(B) In Set V and Not in Set D: If a VALUES entry defines an encoded code value, but that code value is not present in the data.

(C) In Set M and Not in Set V: If the user defines a missing value code that is not defined in a VALUES entry.

(D) M in Set D and Not in Set V: If a defined global missing value code is present in the data for a given variable, but that variable does not have a corresponding VALUES entry.

(E) (Set V values that are not in Set M) that are NOT in Set D = (Set V not in M) not in D: If a VALUES entry is not defined as a missing value code AND is not detected in the data.

Examples

data(ExampleB)
value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
#> $Message
#> [1] "Flag: at least one check flagged."
#> 
#> $Information
#> # A tibble: 7 × 4
#>   check.name                     check.description         check.status details 
#>   <chr>                          <chr>                     <chr>        <named >
#> 1 Check A: In M, Not in D        "All missing value codes… Flag         <tibble>
#> 2 Check B: In V, Not in D        "All value codes are in … Flag         <tibble>
#> 3 Check C: In M, Not in V        "All missing value codes… Flag         <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag         <tibble>
#> 5 Check E: V NOT in M, NOT in D  "All value codes no defi… Passed       <chr>   
#> 6 Awareness: NsetD vs. NsetV     "Size of Set D vs size o… Info         <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info         <tibble>
#> 
print(value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999)))
#> $Message
#> [1] "Flag: at least one check flagged."
#> 
#> $Information
#> # A tibble: 7 × 4
#>   check.name                     check.description         check.status details 
#>   <chr>                          <chr>                     <chr>        <named >
#> 1 Check A: In M, Not in D        "All missing value codes… Flag         <tibble>
#> 2 Check B: In V, Not in D        "All value codes are in … Flag         <tibble>
#> 3 Check C: In M, Not in V        "All missing value codes… Flag         <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag         <tibble>
#> 5 Check E: V NOT in M, NOT in D  "All value codes no defi… Passed       <chr>   
#> 6 Awareness: NsetD vs. NsetV     "Size of Set D vs size o… Info         <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info         <tibble>
#> 
#> $report
#> # A tibble: 7 × 2
#>   Function            Information$check.name    $check.description $check.status
#>   <chr>               <chr>                     <chr>              <chr>        
#> 1 value_missing_table Check A: In M, Not in D   "All missing valu… Flag         
#> 2 value_missing_table Check B: In V, Not in D   "All value codes … Flag         
#> 3 value_missing_table Check C: In M, Not in V   "All missing valu… Flag         
#> 4 value_missing_table Check D: In M & in D, no… "All missing valu… Flag         
#> 5 value_missing_table Check E: V NOT in M, NOT… "All value codes … Passed       
#> 6 value_missing_table Awareness: NsetD vs. Nse… "Size of Set D vs… Info         
#> 7 value_missing_table Awareness: N_DnotM vs. N… "Size of Set D\\M… Info         
#> # ℹ 1 more variable: Information$details <named list>
#> 
#> $tb
#> # A tibble: 52 × 35
#>    VARNAME   TYPE   VALUE MEANING VInD  NumUniqDVs AllMInD AnyMInD MInD  MNotInD
#>    <chr>     <chr>  <chr> <chr>   <lgl>      <int> <lgl>   <lgl>   <lis> <list> 
#>  1 SAMPLE_ID integ… -9999 missin… TRUE          85 TRUE    TRUE    <dbl> <chr>  
#>  2 SEX       integ… 0     male    TRUE           2 FALSE   FALSE   <chr> <dbl>  
#>  3 SEX       integ… 1     female  TRUE           2 FALSE   FALSE   <chr> <dbl>  
#>  4 HEIGHT    decim… -9999 missin… TRUE          96 TRUE    TRUE    <dbl> <chr>  
#>  5 WEIGHT    decim… -9999 missin… TRUE          77 TRUE    TRUE    <dbl> <chr>  
#>  6 BMI       decim… -9999 missin… TRUE          98 TRUE    TRUE    <dbl> <chr>  
#>  7 OBESITY   integ… 0     no      TRUE           3 TRUE    TRUE    <dbl> <chr>  
#>  8 OBESITY   integ… 1     yes     TRUE           3 TRUE    TRUE    <dbl> <chr>  
#>  9 OBESITY   integ… -9999 missin… TRUE           3 TRUE    TRUE    <dbl> <chr>  
#> 10 ABD_CIRC  decim… -9999 missin… TRUE          70 TRUE    TRUE    <dbl> <chr>  
#> # ℹ 42 more rows
#> # ℹ 25 more variables: AllVsInD <lgl>, VsNotInD <list>, AllDefVsInMInD <lgl>,
#> #   DefVsInMNotInD <list>, AllSetMInSetV <lgl>, SetMsNotInSetV <list>,
#> #   All_MInSetD_InSetV <lgl>, setMInDNotInV <list>, All_VNotInM_NotInD <lgl>,
#> #   setVNotInM_NotInD <chr>, NsetD <int>, NsetM <int>, NsetV <int>,
#> #   NsetDAndSetV <int>, NsetMAndSetV <int>, NsetDAndSetM <int>, setV <list>,
#> #   setD <list>, setM <list>, setDnotM <list>, setVnotM <list>, …
#> 
results <- value_missing_table(DD.dict.B, DS.data.B, non.NA.missing.codes = c(-9999))
#> $Message
#> [1] "Flag: at least one check flagged."
#> 
#> $Information
#> # A tibble: 7 × 4
#>   check.name                     check.description         check.status details 
#>   <chr>                          <chr>                     <chr>        <named >
#> 1 Check A: In M, Not in D        "All missing value codes… Flag         <tibble>
#> 2 Check B: In V, Not in D        "All value codes are in … Flag         <tibble>
#> 3 Check C: In M, Not in V        "All missing value codes… Flag         <tibble>
#> 4 Check D: In M & in D, not in V "All missing value codes… Flag         <tibble>
#> 5 Check E: V NOT in M, NOT in D  "All value codes no defi… Passed       <chr>   
#> 6 Awareness: NsetD vs. NsetV     "Size of Set D vs size o… Info         <tibble>
#> 7 Awareness: N_DnotM vs. N_VnotM "Size of Set D\\M vs siz… Info         <tibble>
#> 
results$report$Information$details
#> $CheckA.AllMInD
#> # A tibble: 6 × 7
#>   VARNAME              AllMInD NsetD NsetM NsetDAndSetM MNotInD   MInD     
#>   <chr>                <lgl>   <int> <int>        <int> <list>    <list>   
#> 1 SEX                  FALSE       2     1            0 <dbl [1]> <chr [1]>
#> 2 LENGTH_SMOKING_YEARS FALSE      12     1            0 <dbl [1]> <chr [1]>
#> 3 HEART_RATE           FALSE      44     1            0 <dbl [1]> <chr [1]>
#> 4 SOCIAL_SUPPORT       FALSE       5     1            0 <dbl [1]> <chr [1]>
#> 5 PERCEIVED_CONFLICT   FALSE      24     1            0 <dbl [1]> <chr [1]>
#> 6 PERCEIVED_HEALTH     FALSE      10     1            0 <dbl [1]> <chr [1]>
#> 
#> $CheckB.AllVsInD
#> # A tibble: 2 × 6
#>   VARNAME              AllVsInD NsetD NsetV NsetDAndSetV VsNotInD 
#>   <chr>                <lgl>    <int> <int>        <int> <list>   
#> 1 LENGTH_SMOKING_YEARS FALSE       12     2            1 <chr [1]>
#> 2 HEART_RATE           FALSE       44     1            0 <chr [1]>
#> 
#> $CheckC.AllSetMInSetV
#> # A tibble: 5 × 6
#>   VARNAME            AllSetMInSetV NsetV NsetM NsetMAndSetV SetMsNotInSetV
#>   <chr>              <lgl>         <int> <int>        <int> <list>        
#> 1 SEX                FALSE             2     1            0 <dbl [1]>     
#> 2 CUFFSIZE           FALSE             4     1            0 <dbl [1]>     
#> 3 SOCIAL_SUPPORT     FALSE             5     1            0 <dbl [1]>     
#> 4 PERCEIVED_CONFLICT FALSE             2     1            0 <dbl [1]>     
#> 5 PERCEIVED_HEALTH   FALSE             2     1            0 <dbl [1]>     
#> 
#> $CheckD.All_MInSetD_InSetV
#> # A tibble: 1 × 3
#>   VARNAME  All_MInSetD_InSetV setMInDNotInV
#>   <chr>    <lgl>              <list>       
#> 1 CUFFSIZE FALSE              <dbl [1]>    
#> 
#> $CheckE.All_VNotInM_NotInD
#> [1] "Passed"
#> 
#> $countTable.DvsV
#> # A tibble: 18 × 5
#>    VARNAME              NsetD NsetV NsetDAndSetV Ndiff
#>    <chr>                <int> <int>        <int> <int>
#>  1 CUFFSIZE                 5     4            4     1
#>  2 PERCEIVED_HEALTH        10     2            2     8
#>  3 LENGTH_SMOKING_YEARS    12     2            1    10
#>  4 BP_DIASTOLIC            15     1            1    14
#>  5 PHYSICAL_ACTIVITY       22     1            1    21
#>  6 PERCEIVED_CONFLICT      24     2            2    22
#>  7 SUP_SKF                 24     1            1    23
#>  8 REACT                   25     1            1    24
#>  9 BP_SYSTOLIC             26     1            1    25
#> 10 ABD_SKF                 29     1            1    28
#> 11 RESIST                  36     1            1    35
#> 12 HEART_RATE              44     1            0    43
#> 13 HIP_CIRC                67     1            1    66
#> 14 ABD_CIRC                70     1            1    69
#> 15 WEIGHT                  77     1            1    76
#> 16 SAMPLE_ID               85     1            1    84
#> 17 HEIGHT                  96     1            1    95
#> 18 BMI                     98     1            1    97
#> 
#> $countTable.DnotMvsVnotM
#> # A tibble: 17 × 6
#>    VARNAME              DnotM_sub_VnotM DnotM_eq_VnotM N_DnotM N_VnotM Ndiff
#>    <chr>                <lgl>           <lgl>            <int>   <int> <int>
#>  1 PERCEIVED_HEALTH     FALSE           FALSE               10       2     8
#>  2 LENGTH_SMOKING_YEARS FALSE           FALSE               12       1    11
#>  3 BP_DIASTOLIC         FALSE           FALSE               14       0    14
#>  4 PHYSICAL_ACTIVITY    FALSE           FALSE               21       0    21
#>  5 SUP_SKF              FALSE           FALSE               23       0    23
#>  6 PERCEIVED_CONFLICT   FALSE           FALSE               24       2    22
#>  7 REACT                FALSE           FALSE               24       0    24
#>  8 BP_SYSTOLIC          FALSE           FALSE               25       0    25
#>  9 ABD_SKF              FALSE           FALSE               28       0    28
#> 10 RESIST               FALSE           FALSE               35       0    35
#> 11 HEART_RATE           FALSE           FALSE               44       0    44
#> 12 HIP_CIRC             FALSE           FALSE               66       0    66
#> 13 ABD_CIRC             FALSE           FALSE               69       0    69
#> 14 WEIGHT               FALSE           FALSE               76       0    76
#> 15 SAMPLE_ID            FALSE           FALSE               84       0    84
#> 16 HEIGHT               FALSE           FALSE               95       0    95
#> 17 BMI                  FALSE           FALSE               97       0    97
#>