Skip to content

Commit

Permalink
Apply suggestions from Maxim's code review
Browse files Browse the repository at this point in the history
Co-authored-by: Maxim Moinat <[email protected]>
  • Loading branch information
katy-sadowski and MaximMoinat authored Jul 9, 2024
1 parent a1f9a18 commit 7da80fe
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 8 deletions.
4 changes: 2 additions & 2 deletions vignettes/checks/isStandardValidConcept.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The number and percent of records that do not have a standard, valid concept in
- *Numerator*: The number of rows with an `X_concept_id` that exists in `CONCEPT.concept_id` but does not equal zero, and has `CONCEPT.standard_concept` != ‘S’ or non-NULL `CONCEPT.invalid_reason`.
- *Denominator*: The total number of rows in the table.
- *Related CDM Convention(s)*: All `X_concept_id` columns should contain a standard, valid concept, or 0: https://ohdsi.github.io/CommonDataModel/dataModelConventions.html#Mapping.
- *CDM Fields/Tables*: All tables with an `X_concept_id` column, and all `X_concept_id` columns in those tables.
- *CDM Fields/Tables*: All standard concept ID (`X_concept_id`) columns in all event tables.
- *Default Threshold Value*: 0%


Expand All @@ -52,7 +52,7 @@ WHERE co.concept_id != 0
You may build upon this query by joining the relevant `X_concept_id` and `X_source_concept_id` columns to the concept table and inspecting their names and vocabularies. If the `X_source_concept_id` correctly represents the source code in `X_source_value`, the fix will be a matter of ensuring your ETL is correctly using the concept_relationship table to map the source concept ID to a standard concept via the ‘Maps to’ relationship. If you are not populating the `X_source_concept_id` column and/or are using an intermediate concept mapping table, you may need to inspect the mappings in your mapper table to ensure they’ve been generated correctly using the ‘Maps to’ relationship for your CDM’s vocabulary version.

### Data Users
This check failure means that the failing rows will not be picked up in a standard OHDSI analysis. It is highly recommended to work with your ETL team or data provider, if possible, to resolve this issue.
This check failure means that the failing rows will not be picked up in a standard OHDSI analysis. Especially when participating in network research, where only standard concepts are used, this might result in invalid results. It is highly recommended to work with your ETL team or data provider, if possible, to resolve this issue.

However, you may work around it at your own risk by determining whether or not the affected rows are relevant for your analysis. Here’s an example query you could run to inspect failing rows in the condition_occurrence table:

Expand Down
3 changes: 1 addition & 2 deletions vignettes/checks/measurePersonCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ output:
**Context**: Validation\
**Category**: Completeness\
**Subcategory**: \
**Severity**: CDM convention &#x26A0; Characterization &#10004; \
**Severity**: CDM convention &#x26A0; (for observation period), Characterization &#10004; \ (for all other tables)


## Description
Expand Down Expand Up @@ -64,7 +64,6 @@ Action on persons missing records in other clinical event tables will depend on

If more persons than expected are missing data in a given table, run the violated rows SQL snippet to retrieve these persons’ person_ids, and inspect these persons’ other clinical event data in the CDM for trends. You may also use `person_source_value` to trace back to these persons’ source data to identify source data records potentially missed by the ETL.

Note that in some cases, the failure threshold for this check may need to be adjusted according to completeness expectations for a given data source.


### Data Users
Expand Down
4 changes: 2 additions & 2 deletions vignettes/checks/measureValueCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ WHERE cdmTable.@cdmFieldName IS NULL
```

### ETL Developers
Failures of this check on fields required in the CDM specification are redundant with failures of `isRequired`. See [isRequired documentation](isRequired.html) for more information.
Failures of this check on required fields are redundant with failures of `isRequired`. See [isRequired documentation](isRequired.html) for more information.

ETL developers have 2 main options for the use of this check on non-required fields:

- The check threshold may be set to 100% for non-required fields such that the check will never fail. The check result can be used simply to understand completeness for these fields
- The check threshold may be left on 100% for non-required fields such that the check will never fail. The check result can be used simply to understand completeness for these fields
- The check threshold may be set to an appropriate value corresponding to completeness expectations for each field given what’s available in the source data. The check may be disabled for fields known not to exist in the source data. Other fields may be set to whichever threshold is deemed worthy of investigation

Unexpectedly missing values should be investigated for a potential root cause in the ETL. For expected missingness, rows that violate this check in non-required fields are acceptable but should be clearly communicated to data users so that they can know when and when not to expect data to be present in each field. To avoid confusion for users, however, thresholds should be modified to avoid check failures at expected levels.
Expand Down
2 changes: 1 addition & 1 deletion vignettes/checks/sourceConceptRecordCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,6 @@ If source values return legitimate matches on concept_code, it’s possible that
If source values do NOT return matches on concept_code and you are NOT handling concept mapping locally for a non-OMOP source vocabulary, then you likely have a malformed source code or one that does not exist in the OMOP vocabulary. Please see the documentation in the [standardConceptRecordCompleteness](standardConceptRecordCompleteness.html) page for instructions on how to handle this scenario.

### Data Users
Since most standard OHDSI analytic workflows rely on the standard concept field and not the source concept field, failures of this check will not necessarily impact your analysis. However, if your analysis depends on understanding source coding practices or on codes you know may not be fully mapped to OMOP standard concepts, then this will be a critical check failure to understand.
Since most standard OHDSI analytic workflows rely on the standard concept field and not the source concept field, failures of this check will not necessarily impact your analysis. However, having the source concept will give you a better understanding of the provenance of the code and highlight potential issues where meaning is lost due to mapping to a standard concept.

Utilize the investigation queries above to understand the scope and impact of the mapping failures on your specific analytic use case. If none of the affected codes seem to be relevant for your analysis, it may be acceptable to ignore the failure. However, since it is not always possible to understand exactly what a given source value represents, you should proceed with caution and confirm any findings with your ETL provider if possible.
2 changes: 1 addition & 1 deletion vignettes/checks/standardConceptRecordCompleteness.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ WHERE concept_code = <source value>
- If no result is returned, consider whether the source value may be a malformed version of a legitimate code (for example, sometimes ICD10-CM codes do not contain a “.” in source data). If you can confirm that the code is properly formatted, then you have a source code which does not exist in the OMOP vocabulary. If you believe the code was omitted from the vocabulary in error, please report this issue to the vocabulary team. Otherwise, the short-term course of action will be to generate a mapping for the code locally and implement the mapping in your ETL. For the longer term, the vocabulary team provides a workflow to submit new vocabularies for inclusion in the OMOP vocabularies
- Note that in some cases, you will find that no standard concept exists to which to map your source code. In this case, the standard concept ID field should be left as 0 in the short term; in the longer term please work with the vocabulary team to address this gap as recommended above

- Finally, if the investigation query returns no source value, you must trace the relevant record(s) back to their source and confirm if the missing value is expected. If not, identify and fix the related issue in your ETL. If the record legitimately has no value/code in the source data, then the standard concept ID may be left as 0. However, in some cases these “code-less” records represent junk data which should be filtered out in the ETL. The proper approach will be context-dependent
- Finally, if the investigation query returns no source value, you must trace the relevant record(s) back to their source and confirm if the missing source value is expected. If not, identify and fix the related issue in your ETL. If the record legitimately has no value/code in the source data, then the standard concept ID may be left as 0. However, in some cases these “code-less” records represent junk data which should be filtered out in the ETL. The proper approach will be context-dependent
- Note in the special case of unitless measurements/observations, the unit_concept_id field should NOT be coded as 0 and rather should be left NULL (the unit_concept_id fields are optional in the CDM spec)

It is important to note that records with a 0 standard concept ID field will be unusable in standard OHDSI analyses and thus should only be preserved if there is truly no standard concept ID for a given record. Depending on the significance of the records in question, one should consider removing them from the dataset; however, this choice will depend on a variety of context-specific factors and should be made carefully. Either way, the presence/absence of these unmappable records and an explanation for why they could not be mapped should be clearly documented in the ETL documentation.
Expand Down

0 comments on commit 7da80fe

Please sign in to comment.