Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] job manager handles job state transition #180

Merged
merged 6 commits into from
Sep 17, 2024
Merged

Conversation

xinchen384
Copy link
Contributor

@xinchen384 xinchen384 commented Sep 13, 2024

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: part of issue #182

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Sep 13, 2024

Can you create some issues to track the progress? each PR should be linked to one specific issues or umbrella issue. Otherwise, it's hard for me to know the scope the PR.

@xinchen384
Copy link
Contributor Author

Can you create some issues to track the progress? each PR should be linked to one specific issues or umbrella issue. Otherwise, it's hard for me to know the scope the PR.

I create this issue here: #182 and link it in the PR's description.

1. _pending_jobs are jobs that are not scheduled yet
2. _in_progress_jobs are jobs that are in progress now.
Theses are the input to the job scheduler.
3. _done_jobs are inactive jobs. This needs to be updated periodically.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason done job should be updated periodically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is to remove outdated jobs. For example, it only stores the jobs for the past 1 day. Otherwise, this results in memory leak as more and more jobs are accumulated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds tricky, In that case, done job would be only updated once, right? not periodically?

del self._pending_jobs[job_id]
return True

def mark_job_progress(self, job_id, executed_requests):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems mark_job_progress and following methods are all in-memory operations, do you plan to persist the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the jobs and progress are planned to be persist in storage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have some follow up task to track this part.

Theses are the input to the job scheduler.
3. _done_jobs are inactive jobs. This needs to be updated periodically.
"""
self._pending_jobs = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the job_manager is down, how do you recover the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I have one function: "def sync_job_to_storage(self, jobId)". There should be another constructor from storage. This is a TODO. I'll try to have a first E2E ASAP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@xinchen384
Copy link
Contributor Author

Comments have been addressed now. @Jeffwan

@Jeffwan
Copy link
Collaborator

Jeffwan commented Sep 17, 2024

beside the design questions we discussed, rest part looks good to me. Feel free to merge it. Let's track those stories and implement them

@xinchen384
Copy link
Contributor Author

beside the design questions we discussed, rest part looks good to me. Feel free to merge it. Let's track those stories and implement them

Ack

@xinchen384 xinchen384 merged commit d12ff81 into main Sep 17, 2024
4 checks passed
@xinchen384 xinchen384 deleted the xin/jobmanager branch September 17, 2024 23:46
gangmuk pushed a commit that referenced this pull request Jan 25, 2025
* job manager handles job metadata and state transition

* update comments

* format check is done

* update comment

---------

Co-authored-by: xin.chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants