Skip to content

Commit

Permalink
consistent use of % for threshold values
Browse files Browse the repository at this point in the history
  • Loading branch information
MaximMoinat committed Jul 7, 2024
1 parent 45e3841 commit a1f9a18
Show file tree
Hide file tree
Showing 9 changed files with 20 additions and 14 deletions.
2 changes: 1 addition & 1 deletion vignettes/checks/isStandardValidConcept.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The number and percent of records that do not have a standard, valid concept in
- *Denominator*: The total number of rows in the table.
- *Related CDM Convention(s)*: All `X_concept_id` columns should contain a standard, valid concept, or 0: https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Mapping.
- *CDM Fields/Tables*: All tables with an `X_concept_id` column, and all `X_concept_id` columns in those tables.
- *Default Threshold Value*: 0
- *Default Threshold Value*: 0%


## User Guidance
Expand Down
4 changes: 3 additions & 1 deletion vignettes/checks/measurePersonCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ The number and percent of persons in the CDM that do not have at least one recor
- *Denominator*: The total number of persons in the `PERSON` table.
- *Related CDM Convention(s)*: Each Person needs to have at least one `OBSERVATION_PERIOD` record. Otherwise, CDM conventions do not dictate any rules for person completeness.
- *CDM Fields/Tables*: By default, this check runs on all tables with a foreign key to the `PERSON` table.
- *Default Threshold Value*: Set to 95 or 100 for most tables but 0 for `OBSERVATION_PERIOD`
- *Default Threshold Value*:
- 0% for `OBSERVATION_PERIOD`
- 95% or 100% for other tables


## User Guidance
Expand Down
6 changes: 4 additions & 2 deletions vignettes/checks/measureValueCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ The number and percent of records with a NULL value in the @cdmFieldName of the
- *Denominator*: The total number of rows in the table.
- *Related CDM Convention(s)*: None. This check should be used to check local expectations about completeness of a field given characteristics of the source data.
- *CDM Fields/Tables*: All fields in all event tables.
- *Default Threshold Value*: 0 for required fields; 100 for all others
- *Default Threshold Value*:
- 0% for required fields
- 100% for all others


## User Guidance
Expand Down Expand Up @@ -57,7 +59,7 @@ ETL developers have 2 main options for the use of this check on non-required fie
Unexpectedly missing values should be investigated for a potential root cause in the ETL. For expected missingness, rows that violate this check in non-required fields are acceptable but should be clearly communicated to data users so that they can know when and when not to expect data to be present in each field. To avoid confusion for users, however, thresholds should be modified to avoid check failures at expected levels.

### Data Users
This check informs you of the level of missing data in each column of the CDM. If data is missing in a required column, see the isRequired documentation for more information.
This check informs you of the level of missing data in each column of the CDM. If data is missing in a required column, see the `isRequired` documentation for more information.

The interpretation of a check failure on a non-required column will depend on the context. In some cases, the threshold for this check will have been very deliberately set, and any failure should be cause for concern unless justified and explained by your ETL provider. In other cases, even if the check fails it may not be worrisome if the check result is in line with your expectations given the source of the data. When in doubt, utilize the inspection query above to ensure you can explain the missing values.

Expand Down
2 changes: 1 addition & 1 deletion vignettes/checks/plausibleValueHigh.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The number and percent of records with a value in the @cdmFieldName field of the
- `DRUG_EXPOSURE.refills` (compared to 24)
- `DRUG_EXPOSURE.days_supply` (compared to 365)
- `DRUG_EXPOSURE.quantity` (compared to 1095)
- *Default Threshold Value*: 1
- *Default Threshold Value*: 1%


## User Guidance
Expand Down
2 changes: 1 addition & 1 deletion vignettes/checks/plausibleValueLow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The number and percent of records with a value in the @cdmFieldName field of the
- `DEVICE_EXPOSURE.quantity`, `SPECIMEN.quantity`, `PROCEDURE_OCCURRENCE.quantity` (compared to 1)
- `DRUG_ERA.dose_value`, `DRUG_ERA.gap_days` (compared to 0)
- `DRUG_ERA.drug_exposure_count` (compared to 1)
- *Default Threshold Value*: 1
- *Default Threshold Value*: 1%


## User Guidance
Expand Down
4 changes: 2 additions & 2 deletions vignettes/checks/sourceConceptRecordCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ The number and percent of records with a value of 0 in the source concept field
- *Related CDM Convention(s)*: [Source concept mapping](https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Fields)
- *CDM Fields/Tables*: All source concept ID (`X_source_concept_id`) columns in all event tables.
- *Default Threshold Value*:
- 10 for primary source concept ID columns in condition, drug, measurement, procedure, device, and observation tables
- 100 for all other source concept ID columns
- 10% for source concept ID columns in condition, drug, measurement, procedure, device, and observation tables
- 100% for all other source concept ID columns


## User Guidance
Expand Down
4 changes: 2 additions & 2 deletions vignettes/checks/sourceValueCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ The number and percent of distinct source values in the @cdmFieldName field of t
- *Related CDM Convention(s)*: The OMOP Common Data Model specifies that codes that are present in a native database should be mapped to standard concepts using either the intrinsic mappings defined in the standard vocabularies or extrinsic mappings defined by the data owner or ETL development team. Note also that variations of this check logic are also used in the [EHDEN CDM Inspection Report](https://github.com/EHDEN/CdmInspection) package, as well as the [AresIndexer](https://github.com/OHDSI/AresIndexer) package for generating indices of unmapped codes.
- *CDM Fields/Tables*: Runs on all event tables that have `X_source_value` fields.
- *Default Threshold Value*:
- 10 for critical event tables' `X_source_value` fields (condition, measurement, procedure, drug, visit)
- 100 for all other fields - to be adjusted based on source-specific expectations
- 10% for `_source_value` fields in condition, measurement, procedure, drug, visit.
- 100% for all other fields


## User Guidance
Expand Down
6 changes: 3 additions & 3 deletions vignettes/checks/standardConceptRecordCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ The number and percent of records with a value of 0 in the standard concept fiel
- *Related CDM Convention(s)*: [Standard concept mapping](https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Fields)
- *CDM Fields/Tables*: All standard concept ID (`X_concept_id`) columns in all event tables.
- *Default Threshold Value*:
- 0 for type concept fields and standard concept fields in era tables
- 5 for most standard concept fields in clinical event tables
- 100 for fields more susceptible to specific ETL implementation context
- 0% for type concept fields and standard concept fields in era tables
- 5% for most standard concept fields in clinical event tables
- 100% for fields more susceptible to specific ETL implementation context


## User Guidance
Expand Down
4 changes: 3 additions & 1 deletion vignettes/checks/withinVisitDates.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ The number and percent of records that occur one week before the corresponding `
- *Denominator*: The total number of rows in the table with a corresponding visit (linked through `visit_occurrence_id`)
- *Related CDM Convention(s)*: There is no explicit convention tied to this check. However, the CDM documentation describes the `visit_occurrence_id` foreign key in the event tables as “The visit during which the <event> occurred.” The underlying assumption is that if a record is tied to a visit, then the date of the record should fall in some reasonable time period around the visit dates. This gives a week of leeway on either side for physician notes or other activities related to a visit to be recorded.
- *CDM Fields/Tables*: This check runs on all event tables: `CONDITION_OCCURRENCE`, `PROCEDURE_OCCURRENCE`, `DRUG_EXPOSURE`, `DEVICE_EXPOSURE`, `MEASUREMENT`, `NOTE`, `OBSERVATION`, and `VISIT_DETAIL`. It will check either the `X_date` or `X_start_date` fields for alignment with corresponding `VISIT_OCCURRENCE` dates by linking on the `visit_occurrence_id`. (**Note:** For VISIT_DETAIL it will check both the visit_detail_start_date and visit_detail_end_date. The default threshold for these two checks is 1%.)
- *Default Threshold Value*: 1% for `VISIT_DETAIL`, 5% for all other tables
- *Default Threshold Value*:
- 1% for `VISIT_DETAIL`
- 5% for all other tables


## User Guidance
Expand Down

0 comments on commit a1f9a18

Please sign in to comment.