Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Scheduler to improve code organization #2593

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

libratiger
Copy link
Contributor

Motivation

  1. When I try to deep into the Zero-Overhead Batch Scheduler , I find is hard to get clear on the scheduling, and is hard to impl a new scheduling policy, so I try to refactor SchedulePolicy,and make it easy to add new policy for me and others.

  2. McCabe indicates that the code complexity has exceeded 15

Modifications

Move sorting logic into separate static methods for better maintainability

Testing:

python3 -m sglang.launch_server --model Qwen/Qwen2.5-0.5B-Instruct
python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 500 --random-input 4096 --random-output 2048

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@libratiger
Copy link
Contributor Author

related with #2571

cc @merrymercy

@libratiger
Copy link
Contributor Author

cc @merrymercy @hnyls2002 if you have time for this PR

I would like to optimize for the task in #2273

Further reduce the scheduling overhead of mixed chunked prefill by simplifying the mix_with_running. The current code first constructs a prefill batch and a decode batch and them merge them. A better method can directly construct a whole mixed batch.

@merrymercy
Copy link
Contributor

merrymercy commented Dec 30, 2024

@libratiger Are you in the slack channel? If you are interested in optimizing the mixed chunked prefill, we can chat in more details.

Currently, we hold this PR because there are several big high-priority pending PRs (speculative decoding, multi-node TP + DP), we probably want to merge them before making big refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants