Skip to content

Analysis runner bug - assuming config on importΒ #645

@illusional

Description

@illusional

Raised by @violetbrina

A couple of the submodules in the analysis runner assume the existence of a config on import which it shouldn't be doing.

Discovered while writing unittests for prod pipes.

Initial offenders found (but there coule be more) are:

  • analysis_runner/dataproc.py
_config = get_config()
ACCESS_LEVEL = _config['workflow']['access_level']
DATASET = _config['workflow']['dataset']
DATASET_GCP_PROJECT = _config['workflow']['dataset_gcp_project']
GCLOUD_CONFIG_SET_PROJECT = f'gcloud config set project {DATASET_GCP_PROJECT}'
  • analysis_runner/examples/cromwell_from_hail_batch.py
_config = get_config()
BILLING_PROJECT = _config['hail']['billing_project']
DATASET = _config['workflow']['dataset']
ACCESS_LEVEL = _config['workflow']['access_level']

These should be updates to not pull values on import. The examples one is probably less important. But it's worth doing another look to see if any other modules do the same.
I propose a cached getenv() or equivalent function to pull those environment variables rather than try and load on import.

^ That approach seems fine, otherwise would throw in just making it call a function to get those values if required. We'd also need to look for any places that BILLING_PROJECT is imported from analysis_runner.dataproc for example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    πŸ”– Ready to be worked on

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions