Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVEX-2446 Add --name-mode arg for dx find data #1415

Merged

Conversation

jethror1
Copy link
Contributor

Summary

Exposes the name_mode parameter for dxpy.find_data_objects called via dx find data. This allows for setting the name mode from the hard coded glob mode to both regexp and exact.

The use case for this is for wanting to find multiple patterns at once in a single dx find data call, which currently cannot be done. The main example is wanting to find files such as bams / vcfs and their indices, but where there are other files with the bam / vcf suffix in the same directory (e.g. having *.bam, *.bami.bai and *bam.recalibration_table). This gets more diffcult with multiple file extensions to find at once.

I have been using the API with dx api system findDataObjects '{"scope":{"project": "<project>","folder":"<folder>"}, "name": {"regexp": ".*bam$|.*bai$"}, "limit": 1000}' but this is a bit cumbersome and it would be good to be able to do it direct via the toolkit.

Changes

  • additional --name_mode argument added to dx find data subparser and passed through to the dxpy.find_data_objects call

Testing

Built toolkit locally in base python3.9-bullseye image as per readme:

$ docker pull python:3.9-bullseye

$ docker run -v `pwd`:/dx-toolkit -w /dx-toolkit -it --rm --entrypoint=/bin/bash python:3.9-bullseye

$ python3 -m pip install src/python/ --upgrade

Minimal example usage:

# example files in remote directory
root@aa224bf45fdb:/dx-toolkit# dx ls
sample_1.bam
sample_1.bam.bai
sample_1.bam.recalibration_table

# returning both bam and bai but not recalibration_table files
root@aa224bf45fdb:/dx-toolkit# dx find data --path $(dx pwd) --name ".*bam$|.*bai$" --name_mode "regexp"
closed  2024-11-18 22:38:07 0 bytes   /tmp/sample_1.bam.bai (file-Gvxv3X84ByGPGZ4BXJQ28gYP)
closed  2024-11-18 22:38:02 0 bytes   /tmp/sample_1.bam (file-Gvxv3Qj4ByG4bKYj9kf6Yx4X)

# using default glob mode with current behaviour
root@aa224bf45fdb:/dx-toolkit# dx find data --path $(dx pwd) --name "*bam" 
closed  2024-11-18 22:38:02 0 bytes   /tmp/sample_1.bam (file-Gvxv3Qj4ByG4bKYj9kf6Yx4X)
root@aa224bf45fdb:/dx-toolkit# dx find data --path $(dx pwd) --name "*bai" 
closed  2024-11-18 22:38:07 0 bytes   /tmp/sample_1.bam.bai (file-Gvxv3X84ByGPGZ4BXJQ28gYP)

I'm not sure if there are unit tests covering this to update, but happy to add tests if needed.

Please feel free to close if not appropriate / wanted, but thought I would try contribute back before writing something more for my own workarounds.

Thanks!

@@ -6144,6 +6144,7 @@ def __call__(self, parser, namespace, values, option_string=None):
parser_find_data.add_argument('--state', choices=['open', 'closing', 'closed', 'any'], help='State of the object')
parser_find_data.add_argument('--visibility', choices=['hidden', 'visible', 'either'], default='visible', help='Whether the object is hidden or not')
parser_find_data.add_argument('--name', help='Name of the object')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser_find_data.add_argument('--name', help='Name of the object')
parser_find_data.add_argument('--name', help='Search criteria for the object name, interpreted according to the --name-mode')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help text updated for --name

@@ -6144,6 +6144,7 @@ def __call__(self, parser, namespace, values, option_string=None):
parser_find_data.add_argument('--state', choices=['open', 'closing', 'closed', 'any'], help='State of the object')
parser_find_data.add_argument('--visibility', choices=['hidden', 'visible', 'either'], default='visible', help='Whether the object is hidden or not')
parser_find_data.add_argument('--name', help='Name of the object')
parser_find_data.add_argument('--name_mode', default='glob', help='Name mode to use for searching', choices=['glob', 'exact', 'regexp'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser_find_data.add_argument('--name_mode', default='glob', help='Name mode to use for searching', choices=['glob', 'exact', 'regexp'])
parser_find_data.add_argument('--name-mode', default='glob', help='Name mode to use for searching', choices=['glob', 'exact', 'regexp'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to --name-mode

- update help text for `--name`
- switch arg `--name_mode` to `--name-mode`
@kpjensen kpjensen changed the title Minor feature: add --name_mode arg for dx find data DEVEX-2446 Add --name-mode arg for dx find data Nov 20, 2024
@kpjensen kpjensen merged commit f417138 into dnanexus:master Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants