-
Notifications
You must be signed in to change notification settings - Fork 685
Include WorkflowMetadata in lineage records #6069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
private static List<String> workflowMetadataPropertiesToRemove = [ | ||
"sessionId", "name", //Already in workflowRun | ||
"scriptFile", "scriptName", "scriptId", "repository", "commitId", "revision", "projectName", "manifest", //Already in workflow | ||
"stats", "success" // End | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to keep as much as possible aligned to WorkflowMeta structure, in the same way as it's done in the tower client
result.workflow = workflow |
we could consider removing repeated values in the parent object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I get your comment.
If I am not wrong, in TowerClient
, the workflow
is a plain map that contains everything that is in WorkflowMetadata
, removing stats and adding other properties like the resolved config and params.
If we do the same and include everything as property of the WorkflowRun
class, it will be very difficult to maintain, so I prefer to keep it inside the metadata
property (or rename it to another name). If a new parameter is added in the WorkflowMetadata, it will also be added to the 'metadata' without any model update.
A small difference is the sessionId
and name
that are properties in WorkflowRun
and they were also in the WorkflowMetadata
map. I do not see a problem with keeping it in metadata
and removing it from WorkflowRun
. We can also add the resolved config if you think it is better.
The big difference is in the workflow
and params
. I think they are the parts that mainly describe the WorkflowRun
, and, in the TowerClient
, they are spread in a set of properties. In workflow
, we already included the information that is in scriptFile
, scriptName
, scriptId
, repository
, commitId
. I have just added other info (revision
, projectName
and manifest
) that I think they are describing the workflow more than the execution. In fact, I think the workflow
description could have a separate record, and LID and just have a reference in the WorkflowRun
.
First approach to include information which is in the
WorkflowMetadata
class in Data lineage records.revision
,projectName
andManifest
to theWorkflow
classWorkflowrun.metadata
field to store other metadata defined atonFlowBegin
(not null) and not used inWorkflow
orWorkflowRun
TODO: