-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiframe Conversion #34
base: master
Are you sure you want to change the base?
Multiframe Conversion #34
Conversation
Thanks @afshinmessiah! I had a quick look and will review more carefully before we meet. In the meantime, as a first step I would suggest you look into the coding style section of the developer guide and make sure the code passes flake8 and mypy checks. |
@hackermd the PR is not complete, and is known not to pass the checks. I suggested Afshin shares this incomplete PR with you so you can (if you like) look at the overall approach before the meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @afshinmessiah, thanks for your PR and apologies for my delay in responding. The code overall looks great. I have a few major and a couple of minor comments:
Major: related to implementation of Modules and Functional Group Macros (see comment on Abstract_MultiframeModuleAdder
)
Minor: styling -> pythonic names and PEP 8 compliance
Please take a look at the individual comments and let me know what you think.
Before merging, we will also need to add unit tests for the legacy
package and make sure the source code passes mypy
and flake8
checks.
src/highdicom/legacy/sop.py
Outdated
@@ -89,25 +72,18 @@ def _convert_legacy_to_enhanced( | |||
modalities.add(ds.Modality) | |||
if len(series) > 1: | |||
raise ValueError( | |||
'All instances must belong to the same series.' | |||
) | |||
'All instances must belong to the same series.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave the previous style
raise ValueError(
'Long Message'
)
instead of
raise ValueError(
'Long Message')
src/highdicom/legacy/sop.py
Outdated
perframe_tags: Sequence[Tag], | ||
shared_tags: Sequence[Tag], | ||
multi_frame_output: Dataset = None): | ||
self.ExcludedFromPerFrameTags = excluded_from_perframe_tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow PEP 8 for attribute names using lower level characters and underscores instead of upper camel case: excluded_from_per_frame_tags
instead of ExcludedFromPerFrameTags
. The latter style is reserved for DICOM Attribute keywords.
src/highdicom/legacy/sop.py
Outdated
from pydicom.valuerep import DT, TM, DA | ||
if a.VR == 'DA' and type(a.value)==str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use isinstance(a.value, str)
instead of type(a.value) == str
src/highdicom/legacy/sop.py
Outdated
tg = tag_for_keyword(kw) | ||
if kw in src: | ||
a = deepcopy(src[kw]) | ||
else: | ||
a = DataElement(tg, | ||
dictionary_VR(tg), default ) | ||
a = DataElement(tg, dictionary_VR(tg), default) | ||
from pydicom.valuerep import DT, TM, DA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place import statements at top of module
src/highdicom/legacy/sop.py
Outdated
# sf_sop_instance_uid = sf_datasets[0] | ||
# mf_sop_instance_uid = LEGACY_ENHANCED_SOP_CLASS_UID_MAP[sf_sop_instance_uid] | ||
# mf_sop_instance_uid = LEGACY_ENHANCED_SOP_CLASS_UID_MAP[ | ||
# sf_sop_instance_uid] | ||
# iod_name = _SOP_CLASS_UID_IOD_KEY_MAP[mf_sop_instance_uid] | ||
# modules = IOD_MODULE_MAP[iod_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented lines
src/highdicom/legacy/sop.py
Outdated
def _copy_attrib_if_present(self, src_ds:Dataset, dest_ds:Dataset, | ||
src_kw_or_tg:str, dest_kw_or_tg:str=None, | ||
check_not_to_be_perframe=True, check_not_to_be_empty=False): | ||
def _copy_attrib_if_present(self, src_ds: Dataset, dest_ds: Dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring to all methods, also "private" methods whose names start with an underscore.
src/highdicom/legacy/sop.py
Outdated
self._copy_attrib_if_present( | ||
ref_dataset, self.TargetDataset, a['keyword'], | ||
check_not_to_be_perframe=check_not_to_be_perframe, | ||
check_not_to_be_empty=check_not_to_be_empty) | ||
|
||
@abstractmethod | ||
def AddModule(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def AddModule(self): | |
def add_module(self) -> None: |
src/highdicom/legacy/sop.py
Outdated
self.ExcludedFromFunctionalGroupsTags = { | ||
tag_for_keyword('SpecificCharacterSet'): False} | ||
|
||
# --------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use spaces to logically separate blocks of lines of code
src/highdicom/legacy/sop.py
Outdated
from abc import ABC, abstractmethod | ||
|
||
|
||
class Abstract_MultiframeModuleAdder(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with your sentiment that logical sets of attributes (modules and functional groups) should be handled in separate units/scopes and abstracted to improve readability of the code and facilitate testing.
However, I would differentiate between Modules and Functional Groups. Both are in principle just sets of attributes, but I would consider Functional Groups more reusable building blocks beyond the scope of a particular IOD when compared to Modules. Functional Groups generally have a narrower scope and it makes more sense in my opinion to instantiate them dynamically at run time and pass the resulting objects to highdicom API functions or methods. Modules on the other hand are conceptual building blocks of IODs and should in my opinion be abstracted via a specific SOPClass
implementation. There are a couple of modules that are reusable across IODs. In particular modules for the Patient, Study, Series and Specimen IEs, which are handled on the SOPClass
abstract base class. Most other modules differ between instance "types" (Image, Document, etc.) and should thus be handled by concrete classes derived from SOPClass
.
Therefore, I would recommend the following:
-
Make units that represent Modules "private" (use names with leading underscore) to indicate to developers that their interface may be subject to change and is not considered part of the highdicom API. In addition, I think modules would be more elegantly handled using composition rather than inheritance (see for example implementation of SOPClass._copy_root_attributes_of_module).
-
Implement Functional Group Macros by inheriting from
DataElementSequence
. Have a look at the existing implementations in thecontent
module (see for example content.PixelMeasuresSequence). This approach allows developers to instantiate a Functional Group Macro and include the constructed object into thePerFrameFunctionalGroupsSequence
orSharedFunctionalGroupsSequence
(which is shared by many IODs representing multi-frame images) in form of a list/sequence ofpydicom.Dataset
instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hackermd, I restructured the codes to apply your recommendations. Now there are a couple of key classes I'm going to use in order to convert the single frame.
- There are classes
FrameSet, FrameSetCollection
responsible for detecting all instances that belong to a multi-frame image and then identify the shared/perframe attributes. - The class
LegacyConvertedEnhanceImage
inheritingSOPClass
and is responsible for adding all necessary modules to multi-frame image. - There are
PerframeFunctionalGroup/SharedFunctionalGroup
classes inheritingDataElementSequence
class which initialize sequences for multiframe images getting populated along the way.
A "sorted set" is still a "set", not a "list". Further, the order may need to be determined by sorting based on metadata (even if it is only by InstanceNumber within Series), not the the order in which the instances are supplied in the input, which may depend on how they are listed in the filesystem, which may vary between platforms. |
Python doesn't have an ordered set and the interface of the built-in set differs significantly from that of a sequence, for example it cannot be indexed using the We could instead use "unique" to emphasize that items of the sequence are not duplicated (see also numpy.unique).
Python provides several mechanisms for sorting sequences (lists), for example using sorted, which accepts a function as an argument to obtain an attribute for comparing items and determining their order. This could be readily applied to |
@afshinmessiah, can you implement this change, making the facilitating classes available as utility classes?
Afshin was using those terms since he did not want to change the conventions used in Pixelmed. But I don't think this is worth an argument - can you suggest an alternative name that you like Markus, so Afshin can fix this? |
The main logic seems to be implemented in the def group_per_frame_and_shared_attributes(
instances: Sequence[Dataset]
) -> Tuple[Set[BaseTag], Set[BaseTag]]:
"""Assign attributes to per-frame or shared functional groups.
Parameters
----------
instances: Sequence[pydicom.dataset.Dataset]
DICOM SOP instances
Returns
-------
Tuple[Set[pydicom.tag.BaseTag], Set[pydicom.tag.BaseTag]]
Tags of attributes assigned to per-frame and shared functional groups
""" I would prefer the use of keywords instead of tags, but I assume that this function should be capable of dealing with private attributes as well. Should it? |
It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code. |
@afshinmessiah why should that function NOT be capable to operate on private attributes, can you explain? |
Well, let me rephrase: 'it is not necessary', according to the current logic of the code, both per-frame and shared attributes are not allowed to be private. |
In that case, I would prefer the function to have the following interface: def group_per_frame_and_shared_attributes(
instances: Sequence[Dataset]
) -> Tuple[Set[str], Set[str]]:
"""Assign attributes to per-frame or shared functional groups.
Parameters
----------
instances: Sequence[pydicom.dataset.Dataset]
DICOM SOP instances
Returns
-------
Tuple[Set[str], Set[str]]
Keywords of attributes assigned to per-frame and shared functional groups
""" The function be able to replace the |
@hackermd, I am more of a fan of objects and classes than of functions since they are easy to extend and well organized. Of course, we can rewrite any class and object in function form but I don't get this preference of function over the class. |
That will not be that easy if we make them part of the public API. That's why I suggested keeping them private and only using them internally in the constructors for now. We can later revisit if additional use cases come up. |
@afshinmessiah I don't necessarily agree with Markus, but let's just go ahead with this suggestion - can you make those private and use them only internally in the constructors? |
@fedorov, I agree. I think it's better not to mess with those two classes and make them private, we can use a duplicate of those classes externally to handle our own needs over the course of conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @afshinmessiah ,
I managed to go through the rest of the code, and left a bunch of minor things to be addressed. Also please note my replies to a few of the unresolved previous threads.
Additionally, I notice that the tests occur entirely on synthetic data. Ideally there would be some tests that are on something closer to real world data. Fortunately, pydicom has a bunch of test files that we can use because they will already be available on any machine if pydicom is available.
from pydicom.data import get_testdata_file
# Each of these will return a path to a directory containing an MR series
directory = get_testdata_file('MR700')
directory = get_testdata_file('MR1')
directory = get_testdata_file('MR2')
# Each of these will return a path to a directory containing a CT series
directory = get_testdata_file('CT2')
directory = get_testdata_file('CT5N')
I think it would be a really good idea if we used these more realistic files to test the conversion process, and could be done quite quickly.
In my opinion after these are addressed, we should be ok to merge.
src/highdicom/legacy/sop.py
Outdated
Parameters | ||
---------- | ||
single_frame_list: List[pydicom.dataset.Dataset] | ||
list of single frames that have equal distinguising attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list of single frames that have equal distinguising attributes | |
list of single frames that have equal distinguishing attributes |
ـDicomHelper.istag_group_length(ttag) and not | ||
self._istag_excluded_from_perframe(ttag) and | ||
ttag != tag_for_keyword('PixelData')): | ||
elem = ds[ttag] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please make a note about this in a comment in the code or someone may try to simplify this in the future and get very confused. Also please change the loop to
for ttag in ds.keys():
since you are not using the elem
yielded from the loop
src/highdicom/legacy/sop.py
Outdated
rough_shared[ttag] = [elem.value] | ||
to_be_removed_from_shared = [] | ||
for ttag, v in rough_shared.items(): | ||
v = rough_shared[ttag] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, same as above, please replace the loop and leave a comment to avoid future problems
self._frame_sets: List[FrameSet] = [] | ||
frame_counts = [] | ||
frameset_counter = 0 | ||
while len(self.mixed_frames_copy) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Please refactor this somehow, since the behaviour is very unclear and this will make the code hard to maintain: neither the name of _find_all_similar_to_first_datasets
nor its docstring suggest that it will remove anything from the list. Probably the easiest thing to do would be to have the function find what to remove and then have the removal actually happen in this loop directly
distinguishing_tags_existing.append(tg) | ||
else: | ||
distinguishing_tags_missing.append(tg) | ||
logger_msg = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please type hint with the full type: Set[str]
src/highdicom/legacy/sop.py
Outdated
des.extend(src) | ||
|
||
def _add_module_to_mf_pixel_data(self) -> None: | ||
"""Copies/add`s a pixel_data` multiframe module to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Copies/add`s a pixel_data` multiframe module to | |
"""Copies/adds a `pixel_data` multiframe module to |
src/highdicom/legacy/sop.py
Outdated
for i in range(0, len(self._legacy_datasets)): | ||
if kw not in self._legacy_datasets[i]: | ||
continue | ||
pixel_data_a = self._legacy_datasets[i][kw] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid indexing in for loops if possible
for i in range(0, len(self._legacy_datasets)): | |
if kw not in self._legacy_datasets[i]: | |
continue | |
pixel_data_a = self._legacy_datasets[i][kw] | |
for legacy_ds in self._legacy_datasets: | |
if kw not in legacy_ds: | |
continue | |
pixel_data_a = legacy_ds[kw] |
src/highdicom/legacy/sop.py
Outdated
""" | ||
out: tuple = tuple() | ||
if 'SeriesNumber' in x: | ||
out += (x['SeriesNumber'].value, ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think you can just use the simpler syntax here
out += (x['SeriesNumber'].value, ) | |
out += (x.SeriesNumber, ) |
tests/test_legacy.py
Outdated
'ORIGINAL', 'PRIMARY', 'RECON', 'EMISSION'] | ||
tmp_dataset.PixelSpacing = [ | ||
self._pixel_spacing, self._pixel_spacing] | ||
tmp_dataset.PatientName = 'John^Doe' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not go perpetuating misunderstandings of the DICOM name encoding scheme:
tmp_dataset.PatientName = 'John^Doe' | |
tmp_dataset.PatientName = 'Doe^John' |
tests/test_legacy.py
Outdated
assert sop._DicomHelper.isequal(v1.value, v2.value) is True | ||
assert sop._DicomHelper.isequal(v1.value, v3.value) is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mis-use of the python is
keyword.
assert sop._DicomHelper.isequal(v1.value, v2.value) is True | |
assert sop._DicomHelper.isequal(v1.value, v3.value) is False | |
assert sop._DicomHelper.isequal(v1.value, v2.value) | |
assert not sop._DicomHelper.isequal(v1.value, v3.value) |
Thanks, @CPBridge for the comments. I tried to address almost all comments.
Well, the good thing about synthetic test cases is that I have control over the number of frames, framesets, and geometry while generating the data. I agree with you on the preference for real data over synthetic data. However, it is complicated to design test cases for real data (to address all modules being added through the conversion process) when the expected multi-frame is unknown. Especially when the whole structure of FrameSet, FrameSetCollection, and GeometryOfSlice classes are subject to change. I'd rather stick with current tests and for this PR since I'm short of time cleaning up the code a getting it ready to merge. |
Thanks @afshinmessiah for addressing these issues so quickly. Understood regarding the tests - I will try to address this in a future PR |
@hackermd as part of this PR we should add a statement to the README perhaps that this work was supported by IDC. Let me know if you agree, and I can add the corresponding text to the PR. |
@afshinmessiah Would it be possible to rebase your branch on git/master ? thanks |
Hello, can someone summarize the state of this issue? I came here from pydicom/pydicom#1705 (comment), since I want to get rid of our It seems to me that this feature is mostly done, some minor issues were being addressed and then the PR was abandoned. @afshinmessiah, do you have any intentions of continuing your work here? I might be interested in taking over and getting this PR merged. Do you have any estimate of how long it might take? |
Hi @phistep. Thanks for reaching out about this. Ashfin contributed this PR some time ago but the core team (@hackermd and myself) had some disagreements about style and API design, and then he changed roles and could no longer dedicate time to it. Since then it has always been my intention to pick this up again and get it merged because there are a lot of valuable changes here only a few minor sticking point in the way, but then there has always been something more urgent... However, now is as good a time as any I suppose. I will have to take some time to go back through the comments and remind myself of the points that need to be addressed. But from memory they are along these lines:
It is worth noting that the legacy conversion process in the released version of the library does work, but that this pull request contains more sophisticated sorting of frames and a few fixes. If you are willing to help, that could certainly be very useful. Let me find some time in the next few days to review properly and follow up with some more detail |
It would be great to have this integrated - I am not the one who wrote the code, but together with @dclunie we did supervise Afshin, so should be able to address at least some questions. Happy to meet to chat about this Chris! |
Okay, really cool! Then I will start reading the changeset in-depth and maybe get back to you with some questions and a roadmap. |
Dear Markus,
I've developed a new extensible conversion engine for HighDicom based David Clunie's PixelMed. This piece of code will be our meeting's main topic on July 9th.
Although it is not necessary, it would be wonderful if you take a look at the structure of the new conversion. This file also contains the UML class diagram of the new code to get a broader focus on it.