ct_image_reader #79

bhaskar2443053 · 2022-12-19T10:01:44Z

This PR fixes #78 .

Description of changes

This a new CT image reader implementation using pydicom lib.

Possible influences of this PR.

The sample data is a bit huge, even for one instance.

Test Conducted

the test case verifies the existence of the image path from the reader in the source folder.

codecov · 2022-12-19T12:10:00Z

Codecov Report

Merging #79 (87c6ede) into master (68cdac6) will increase coverage by 0.77%.
The diff coverage is 95.55%.

❗ Current head 87c6ede differs from pull request most recent head cf68e4d. Consider uploading reports for the commit cf68e4d to get more accurate results

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   87.89%   88.66%   +0.77%     
==========================================
  Files          14       16       +2     
  Lines         735      803      +68     
==========================================
+ Hits          646      712      +66     
- Misses         89       91       +2

Impacted Files	Coverage Δ
...ests/fortex/health/readers/CT_image_reader_test.py	`95.45% <95.45%> (ø)`
fortex/health/readers/CT_image_reader.py	`95.65% <95.65%> (ø)`
...ortex/health/processors/scispacy_processor_test.py	`100.00% <0.00%> (ø)`
fortex/health/processors/scispacy_processor.py	`98.63% <0.00%> (+0.38%)`	⬆️
ftx/medical/clinical_ontology.py	`69.91% <0.00%> (+0.49%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

bhaskar2443053 · 2022-12-20T05:45:28Z

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.

E AttributeError: module 'numpy' has no attribute 'object'

is there something we can do ?

hunterhector · 2022-12-20T06:37:00Z

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.

E AttributeError: module 'numpy' has no attribute 'object'

is there something we can do ?

Could you try to narrow down the error? Most likely there are some new environment upgrades that cause this. I cannot easily figure it out just looking at the error.

bhaskar2443053 · 2022-12-20T06:46:53Z

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.
E AttributeError: module 'numpy' has no attribute 'object'
is there something we can do ?

Could you try to narrow down the error? Most likely there are some new environment upgrades that cause this. I cannot easily figure it out just looking at the error.

Run coverage run -m pytest tests/fortex/health/processors/scispacy_processor_test.py
============================= test session starts ==============================
platform linux -- Python 3.8.15, pytest-5.1.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/runner/work/ForteHealth/ForteHealth
collected 0 items / 1 errors

==================================== ERRORS ====================================
__ ERROR collecting tests/fortex/health/processors/scispacy_processor_test.py __
tests/fortex/health/processors/scispacy_processor_test.py:26: in
from fortex.health.processors.scispacy_processor import (
fortex/health/processors/scispacy_processor.py:20: in
import spacy
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/spacy/init.py:11: in
from thinc.api import prefer_gpu, require_gpu, require_cpu # noqa: F401
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/api.py:2: in
from .initializers import normal_init, uniform_init, glorot_uniform_init, zero_init
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/initializers.py:4: in
from .backends import Ops
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/backends/init.py:7: in
from .ops import Ops
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/backends/ops.py:13: in
from ..util import get_array_module, is_xp_array, to_numpy
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/util.py:48: in
import tensorflow.experimental.dlpack
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/init.py:41: in
from tensorflow.python.tools import module_util as _module_util
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/init.py:64: in
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/framework_lib.py:25: in
from tensorflow.python.framework.ops import Graph
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:54: in
from tensorflow.python.framework import dtypes
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: in
np.object,
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/init.py:284: in getattr
raise AttributeError("module {!r} has no attribute "
E AttributeError: module 'numpy' has no attribute 'object'
=============================== warnings summary ===============================
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/torch/cuda/init.py:52
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223: DeprecationWarning: np.bool8 is a deprecated alias for np.bool_. (Deprecated NumPy 1.24)
np.bool8: (False, True),

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future np.object will be defined as the corresponding NumPy scalar. (This may have returned Python scalars in past versions.
np.object,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 4 warnings, 1 error in 2.39s =========================
Error: Process completed with exit code 2.

bhaskar2443053 · 2022-12-20T06:47:35Z

The same error with xray processor test case

hunterhector · 2022-12-20T06:52:15Z

After a quick glance it looks like it is the spacy and numpy have some version conflicts. I would recommend recreating this issue locally and find out the right version combination.

To be honest, we haven't upgraded some library versions for a long time. We should spend a sprint on that alone.

bhaskar2443053 · 2022-12-20T06:54:25Z

After a quick glance it looks like it is the spacy and numpy have some version conflicts. I would recommend recreating this issue locally and find out the right version combination.

To be honest, we haven't upgraded some library versions for a long time. We should spend a sprint on that alone.

also got it with x-ray processor test too. it was added last month.

Run coverage run -m pytest tests/fortex/health/processors/xray_processor_test.py
============================= test session starts ==============================
platform linux -- Python 3.8.15, pytest-5.1.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/runner/work/ForteHealth/ForteHealth
collected 0 items / 1 errors

==================================== ERRORS ====================================
____ ERROR collecting tests/fortex/health/processors/xray_processor_test.py ____
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:857: in _get_module
return importlib.import_module("." + module_name, self.name)
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
:1014: in _gcd_import
???
:991: in _find_and_load
???
:975: in _find_and_load_unlocked
???
:671: in _load_unlocked
???
:843: in exec_module
???
:219: in _call_with_frames_removed
???
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/init.py:33: in
from .audio_classification import AudioClassificationPipeline
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/audio_classification.py:20: in
from .base import PIPELINE_INIT_ARGS, Pipeline
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/base.py:33: in
from ..modelcard import ModelCard
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/modelcard.py:44: in
from .training_args import ParallelMode
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/training_args.py:26: in
from .trainer_utils import EvaluationStrategy, HubStrategy, IntervalStrategy, SchedulerType, ShardedDDPOption
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/trainer_utils.py:46: in
import tensorflow as tf
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/init.py:41: in
from tensorflow.python.tools import module_util as _module_util
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/init.py:64: in
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/framework_lib.py:25: in
from tensorflow.python.framework.ops import Graph
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:54: in
from tensorflow.python.framework import dtypes
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: in
np.object,
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/init.py:284: in getattr
raise AttributeError("module {!r} has no attribute "
E AttributeError: module 'numpy' has no attribute 'object'

The above exception was the direct cause of the following exception:
tests/fortex/health/processors/xray_processor_test.py:25: in
from fortex.health.processors.xray_image_processor import XrayImageProcessor
fortex/health/processors/xray_image_processor.py:19: in
from transformers import pipeline
:1039: in _handle_fromlist
???
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:847: in getattr
module = self._get_module(self._class_to_module[name])
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:859: in _get_module
raise RuntimeError(
E RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
E module 'numpy' has no attribute 'object'
=============================== warnings summary ===============================
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223: DeprecationWarning: np.bool8 is a deprecated alias for np.bool_. (Deprecated NumPy 1.24)
np.bool8: (False, True),

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future np.object will be defined as the corresponding NumPy scalar. (This may have returned Python scalars in past versions.
np.object,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 3 warnings, 1 error in 2.61s =========================

hunterhector

See my comments inline.

An additional note on the binary files. These binary example files are very useful but they will make the master branch larger and larger. Could you add them in another PR using the following trick:

https://gist.github.com/mcroach/811c8308f4bd78570918844258841942

Basically, you can create an asset branch (you can use another name, say data branch) that holds the content, and download them when needed. This way we can keep the master branch lean.

hunterhector · 2022-12-20T06:55:20Z

.github/workflows/main.yml

          - { testfile: tests/fortex/health/readers/mimic3_note_reader_test.py }
          - { testfile: tests/fortex/health/readers/xray_image_reader_test.py }
-          - { testfile: tests/fortex/health/processors/xray_processor_test.py }


Let's locate these errors locally and try to figure them out.

resolved with numpy version downgrade

@hunterhector I don't have access to create data branch in Forte repo. Either you can create a branch or give me the necessary access.

I have created a complete empty data branch: https://github.com/asyml/forte/tree/data

hunterhector · 2022-12-20T07:01:55Z