Skip to content

Conversation

@violetbrina
Copy link
Collaborator

Changes to the analysis runner to enable Azure compatability.

@violetbrina violetbrina self-assigned this Mar 28, 2023
@violetbrina violetbrina requested a review from illusional March 28, 2023 03:08
Copy link
Contributor

@illusional illusional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like awesome progress!

'-c',
'--cloud',
required=False,
default=DEFAULT_CLOUD_ENVIRONMENT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit instead of provide a default? Which lets the analysis-runner decide the default

Comment on lines 124 to 138
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')

if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')

if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this, group member checks are not in secrets, therefore no gcp_project ID is needed anymore (I think):

Suggested change
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')
if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')
if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)

server/util.py Outdated
if environment == 'gcp':
output_dir = f'gs://cpg-{dataset}-{cpg_namespace(access_level)}/{output_prefix}'
elif environment == 'azure':
# TODO: need a way for analysis runner to know where to save metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It follows the same sort of convention right, where storage-account is cpg{datasetWithoutTabs}

azure://{storage-account}/{main,test}/{output_prefix}

import hailtop.batch as hb


@click.command()
Copy link
Contributor

@illusional illusional Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of test workflows in examples/batch, can you use them or move this one to there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants