Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor _run_node_async to use asyncio #4562

Merged
merged 14 commits into from
Mar 24, 2025
Merged

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Mar 12, 2025

Description

Resolves #4289

Development notes

  • Use asyncio.run to call _run_node_async when Task.execute is called with is_async=True.
  • Refactor _run_node_async method to use asyncio
  • Create async tasks to load and save datasets concurrently.
  • Created helper methods _async_dataset_load and _async_dataset_save to load and save datasets.
  • Used asyncio.to_thread for IO blocking operations (catalog.load and catalog.save)
  • Updated tests

QA notes

  • All tests should pass
  • For manual testing -
  1. Create a pipeline which takes multiple inputs and produces multiple outputs as suggested here
# async_test_pipeline/pipeline.py
def create_pipeline(**kwargs) -> Pipeline:
    return Pipeline([
        node(
            func=process_ds,
            inputs=["companies", "shuttles"],
            outputs=["preprocessed_companies", "preprocessed_shuttles"],
            name="preprocess_node"
        )
    ])


# async_test_pipeline/nodes.py
def process_ds(companies: pd.DataFrame, shuttles: pd.DataFrame):
    return preprocess_companies(companies), preprocess_shuttles(shuttles)
  1. Try executing the pipeline with kedro run --pipeline=async_test_pipeline and kedro run --pipeline=async_test_pipeline --async. Both should yield similar results
  2. To check for performance, you can also run time kedro run --pipeline=async_test_pipeline and time kedro run --pipeline=async_test_pipeline --async. (NOTE: I haven't seen much difference with the datasets available)

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@ravi-kumar-pilla ravi-kumar-pilla changed the title Refactor _run_node_async to use asyncio Refactor _run_node_async to use asyncio Mar 14, 2025
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review March 14, 2025 00:15
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @ravi-kumar-pilla ! Much cleaner than before. I left some minor comments, but otherwise all good 👍

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ravi-kumar-pilla, add it to the release notes :) (Also appreciate the QA notes)

@ravi-kumar-pilla
Copy link
Contributor Author

ravi-kumar-pilla commented Mar 21, 2025

Thanks @ravi-kumar-pilla, add it to the release notes :) (Also appreciate the QA notes)

Thank you @ankatiyar ....since this is not a user facing change, I removed it from release notes as suggested here

@ravi-kumar-pilla ravi-kumar-pilla merged commit 046bbc4 into main Mar 24, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use asyncio for async operations in Runners
3 participants