Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ct_image_reader #79

Closed
wants to merge 0 commits into from
Closed

Conversation

bhaskar2443053
Copy link
Collaborator

This PR fixes #78 .

Description of changes

This a new CT image reader implementation using pydicom lib.

Possible influences of this PR.

The sample data is a bit huge, even for one instance.

Test Conducted

the test case verifies the existence of the image path from the reader in the source folder.

@codecov
Copy link

codecov bot commented Dec 19, 2022

Codecov Report

Merging #79 (87c6ede) into master (68cdac6) will increase coverage by 0.77%.
The diff coverage is 95.55%.

❗ Current head 87c6ede differs from pull request most recent head cf68e4d. Consider uploading reports for the commit cf68e4d to get more accurate results

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   87.89%   88.66%   +0.77%     
==========================================
  Files          14       16       +2     
  Lines         735      803      +68     
==========================================
+ Hits          646      712      +66     
- Misses         89       91       +2     
Impacted Files Coverage Δ
...ests/fortex/health/readers/CT_image_reader_test.py 95.45% <95.45%> (ø)
fortex/health/readers/CT_image_reader.py 95.65% <95.65%> (ø)
...ortex/health/processors/scispacy_processor_test.py 100.00% <0.00%> (ø)
fortex/health/processors/scispacy_processor.py 98.63% <0.00%> (+0.38%) ⬆️
ftx/medical/clinical_ontology.py 69.91% <0.00%> (+0.49%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bhaskar2443053
Copy link
Collaborator Author

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.

E AttributeError: module 'numpy' has no attribute 'object'

is there something we can do ?

@hunterhector
Copy link
Member

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.

E AttributeError: module 'numpy' has no attribute 'object'

is there something we can do ?

Could you try to narrow down the error? Most likely there are some new environment upgrades that cause this. I cannot easily figure it out just looking at the error.

@bhaskar2443053
Copy link
Collaborator Author

Hi @Piyush13y and @hunterhector I am facing this new error with already merged processors.
E AttributeError: module 'numpy' has no attribute 'object'
is there something we can do ?

Could you try to narrow down the error? Most likely there are some new environment upgrades that cause this. I cannot easily figure it out just looking at the error.

Run coverage run -m pytest tests/fortex/health/processors/scispacy_processor_test.py
============================= test session starts ==============================
platform linux -- Python 3.8.15, pytest-5.1.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/runner/work/ForteHealth/ForteHealth
collected 0 items / 1 errors

==================================== ERRORS ====================================
__ ERROR collecting tests/fortex/health/processors/scispacy_processor_test.py __
tests/fortex/health/processors/scispacy_processor_test.py:26: in
from fortex.health.processors.scispacy_processor import (
fortex/health/processors/scispacy_processor.py:20: in
import spacy
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/spacy/init.py:11: in
from thinc.api import prefer_gpu, require_gpu, require_cpu # noqa: F401
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/api.py:2: in
from .initializers import normal_init, uniform_init, glorot_uniform_init, zero_init
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/initializers.py:4: in
from .backends import Ops
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/backends/init.py:7: in
from .ops import Ops
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/backends/ops.py:13: in
from ..util import get_array_module, is_xp_array, to_numpy
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/thinc/util.py:48: in
import tensorflow.experimental.dlpack
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/init.py:41: in
from tensorflow.python.tools import module_util as _module_util
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/init.py:64: in
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/framework_lib.py:25: in
from tensorflow.python.framework.ops import Graph
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:54: in
from tensorflow.python.framework import dtypes
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: in
np.object,
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/init.py:284: in getattr
raise AttributeError("module {!r} has no attribute "
E AttributeError: module 'numpy' has no attribute 'object'
=============================== warnings summary ===============================
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/torch/cuda/init.py:52
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223: DeprecationWarning: np.bool8 is a deprecated alias for np.bool_. (Deprecated NumPy 1.24)
np.bool8: (False, True),

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future np.object will be defined as the corresponding NumPy scalar. (This may have returned Python scalars in past versions.
np.object,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 4 warnings, 1 error in 2.39s =========================
Error: Process completed with exit code 2.

@bhaskar2443053
Copy link
Collaborator Author

The same error with xray processor test case

@hunterhector
Copy link
Member

After a quick glance it looks like it is the spacy and numpy have some version conflicts. I would recommend recreating this issue locally and find out the right version combination.

To be honest, we haven't upgraded some library versions for a long time. We should spend a sprint on that alone.

@bhaskar2443053
Copy link
Collaborator Author

bhaskar2443053 commented Dec 20, 2022

After a quick glance it looks like it is the spacy and numpy have some version conflicts. I would recommend recreating this issue locally and find out the right version combination.

To be honest, we haven't upgraded some library versions for a long time. We should spend a sprint on that alone.

also got it with x-ray processor test too. it was added last month.

Run coverage run -m pytest tests/fortex/health/processors/xray_processor_test.py
============================= test session starts ==============================
platform linux -- Python 3.8.15, pytest-5.1.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/runner/work/ForteHealth/ForteHealth
collected 0 items / 1 errors

==================================== ERRORS ====================================
____ ERROR collecting tests/fortex/health/processors/xray_processor_test.py ____
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:857: in _get_module
return importlib.import_module("." + module_name, self.name)
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
:1014: in _gcd_import
???
:991: in _find_and_load
???
:975: in _find_and_load_unlocked
???
:671: in _load_unlocked
???
:843: in exec_module
???
:219: in _call_with_frames_removed
???
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/init.py:33: in
from .audio_classification import AudioClassificationPipeline
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/audio_classification.py:20: in
from .base import PIPELINE_INIT_ARGS, Pipeline
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/pipelines/base.py:33: in
from ..modelcard import ModelCard
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/modelcard.py:44: in
from .training_args import ParallelMode
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/training_args.py:26: in
from .trainer_utils import EvaluationStrategy, HubStrategy, IntervalStrategy, SchedulerType, ShardedDDPOption
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/trainer_utils.py:46: in
import tensorflow as tf
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/init.py:41: in
from tensorflow.python.tools import module_util as _module_util
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/init.py:64: in
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/framework_lib.py:25: in
from tensorflow.python.framework.ops import Graph
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:54: in
from tensorflow.python.framework import dtypes
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: in
np.object,
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/numpy/init.py:284: in getattr
raise AttributeError("module {!r} has no attribute "
E AttributeError: module 'numpy' has no attribute 'object'

The above exception was the direct cause of the following exception:
tests/fortex/health/processors/xray_processor_test.py:25: in
from fortex.health.processors.xray_image_processor import XrayImageProcessor
fortex/health/processors/xray_image_processor.py:19: in
from transformers import pipeline
:1039: in _handle_fromlist
???
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:847: in getattr
module = self._get_module(self._class_to_module[name])
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/transformers/utils/import_utils.py:859: in _get_module
raise RuntimeError(
E RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
E module 'numpy' has no attribute 'object'
=============================== warnings summary ===============================
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:223: DeprecationWarning: np.bool8 is a deprecated alias for np.bool_. (Deprecated NumPy 1.24)
np.bool8: (False, True),

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future np.object will be defined as the corresponding NumPy scalar. (This may have returned Python scalars in past versions.
np.object,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 3 warnings, 1 error in 2.61s =========================

Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments inline.

An additional note on the binary files. These binary example files are very useful but they will make the master branch larger and larger. Could you add them in another PR using the following trick:

https://gist.github.com/mcroach/811c8308f4bd78570918844258841942

Basically, you can create an asset branch (you can use another name, say data branch) that holds the content, and download them when needed. This way we can keep the master branch lean.

- { testfile: tests/fortex/health/readers/mimic3_note_reader_test.py }
- { testfile: tests/fortex/health/readers/xray_image_reader_test.py }
- { testfile: tests/fortex/health/processors/xray_processor_test.py }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's locate these errors locally and try to figure them out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved with numpy version downgrade

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hunterhector I don't have access to create data branch in Forte repo. Either you can create a branch or give me the necessary access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a complete empty data branch: https://github.com/asyml/forte/tree/data


For more information for the dataset, visit:
https://data.mendeley.com/datasets/jctsfj2sfn/1
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional spaces



class CTimageReader(PackReader):
r""":class:`CTimageReader` is designed to read CT image files from a given folder."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should describe this more clearly, what types of data, what's the folder structure?


def __init__(self):
super().__init__()
self.pydicom = pydicom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a self.pydicom? Can we simply us pydicom later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the self.pydicom and used pydicom directly.

resolved

super().__init__()
self.pydicom = pydicom

def _collect(self, image_directory) -> Iterator[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterator type is too broad.

img = self.pydicom.dcmread(
file_path, **(self.configs.read_kwargs or {})
)
img.PhotometricInterpretation = "YBR_FULL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document these choices (why YBR_FULL)

img.PhotometricInterpretation = "YBR_FULL"
pixel_data = img.pixel_array
pack.add_image(image=pixel_data)
pack.pack_name = file_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack_name is better to be a non-path string (not containing /)

setup.py Outdated
@@ -31,6 +31,7 @@
"transformers==4.18.0",
"protobuf==3.19.4",
"Pillow==8.4.0",
"pydicom==2.3.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be an entry in test, it should have its own keyword in extras_require

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def test_reader(self):
for pack in self.pl.process_dataset(self.orig_image_pth):
self.assertTrue(
pack.pack_name.split("/")[-1] in self.expected_image_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack_name isn't the place for storing paths. If we want to store path, we have discussed to use Payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CT image reader
2 participants