Skip to content

Conversation

@xinyu-intel
Copy link
Contributor

@xinyu-intel xinyu-intel commented Nov 26, 2025

  • a launcher to support mix TPxDP config.
  • a simple load balance proxy server.

Copilot AI review requested due to automatic review settings November 26, 2025 08:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds NIXL (Network Interconnect eXtension Layer) support for disaggregated prefill-decode workloads in vLLM, introducing a proxy server and launcher script to orchestrate distributed inference with separate prefill and decode instances.

Key changes:

  • Adds a comprehensive proxy server implementation with round-robin and load-balanced scheduling policies
  • Implements a bash launcher script for deploying disaggregated vLLM server instances
  • Provides OpenAI-compatible API endpoints with KV cache transfer capabilities

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
examples/nixl/proxy_server.py Implements FastAPI-based proxy server with request routing, load balancing, and KV cache transfer orchestration
examples/nixl/disaggregated_prefill_server_launcher Bash script to launch and configure prefill/decode server instances with NIXL connectivity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +63
async def D_first_token_generator(
generator_d,
callback_owner=None,
decode_instance: str = None,
req_len: int = None,
):
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name 'D_first_token_generator' is unclear. Consider renaming to 'decode_first_token_generator' to make the purpose more explicit.

Copilot uses AI. Check for mistakes.
with self.lock:
return next(cycler)

def schedule(self, cycler: itertools.cycle, request: Optional[dict[str, any]] = None) -> str:
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint uses 'any' instead of 'Any'. This will cause a NameError at runtime as 'any' is not a valid type. Change to 'Any' and ensure it's imported from typing.

Copilot uses AI. Check for mistakes.
try:
error_content = json.loads(error_content)
except json.JSONDecodeError:
error_content = error_content
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is redundant as it assigns 'error_content' to itself. Remove this line as the JSONDecodeError exception already leaves 'error_content' with the text value.

Suggested change
error_content = error_content
pass

Copilot uses AI. Check for mistakes.
self, instance: str, endpoint: str, req_data: dict, request_id: str
): # yapf: disable
"""
Send a request to a service using a client from the pool.
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring only contains a generic description. It should document the parameters (instance, endpoint, req_data, request_id) and return value (response object), especially since this method modifies req_data with KV transfer parameters.

Suggested change
Send a request to a service using a client from the pool.
Send a request to a service using a client from the pool.
Args:
instance (str): The hostname or IP address of the service instance to send the request to.
endpoint (str): The API endpoint path to send the request to.
req_data (dict): The request payload. This method copies and modifies this dictionary with KV transfer parameters.
request_id (str): A unique identifier for the request, used for tracking.
Returns:
httpx.Response: The response object returned by the service.
Side Effects:
The input req_data is copied and modified with KV transfer parameters before sending.

Copilot uses AI. Check for mistakes.
Comment on lines +499 to +505
client = httpx.AsyncClient(timeout=None, base_url=prefiller_base_url)

response = await client.post(
endpoint, json=req_data, headers=headers
) # yapf: disable
response.raise_for_status()

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The httpx.AsyncClient is created but never explicitly closed, which can lead to resource leaks. Use an async context manager (async with) to ensure proper cleanup.

Suggested change
client = httpx.AsyncClient(timeout=None, base_url=prefiller_base_url)
response = await client.post(
endpoint, json=req_data, headers=headers
) # yapf: disable
response.raise_for_status()
async with httpx.AsyncClient(timeout=None, base_url=prefiller_base_url) as client:
response = await client.post(
endpoint, json=req_data, headers=headers
) # yapf: disable
response.raise_for_status()

Copilot uses AI. Check for mistakes.
Comment on lines +574 to +575
print("Error occurred in disagg proxy server")
print(exc_info)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for error reporting bypasses the configured logging system. Use 'logger.error()' instead for consistency with the rest of the code and to ensure errors are properly formatted and can be captured by logging infrastructure.

Suggested change
print("Error occurred in disagg proxy server")
print(exc_info)
logger.error("Error occurred in disagg proxy server", exc_info=exc_info)

Copilot uses AI. Check for mistakes.
Comment on lines +647 to +649
print("Error occurred in disagg proxy server")
print(error_messages)
return StreamingResponse(content=iter(error_messages), media_type="application/json")
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for error reporting bypasses the configured logging system. Use 'logger.error()' instead. Also, returning error_messages in StreamingResponse may not produce valid JSON; consider returning a proper JSONResponse with an error status code.

Suggested change
print("Error occurred in disagg proxy server")
print(error_messages)
return StreamingResponse(content=iter(error_messages), media_type="application/json")
logger.error("Error occurred in disagg proxy server: %s", error_messages)
return JSONResponse(content={"error": error_messages}, status_code=500)

Copilot uses AI. Check for mistakes.
Comment on lines +876 to +877
self.validate_instances(decodes)
self.verify_model_config(decodes, args.model)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable 'args' is not defined in this method scope. The method signature receives 'prefills' and 'decodes' lists, but 'args.model' references an undefined variable. This should likely be 'self.proxy_instance.model' or the model should be passed as a parameter.

Copilot uses AI. Check for mistakes.
for i in $(seq 0 $((NUM_LOCAL_INSTANCES-1))); do
# Calculate port number (base port + instance number)
PORT=$((BASE_PORT+8*NODE_RANK+i))
# Calculate side channel port. Avoid clash with with TP workers.
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling: 'with with' contains a duplicate word. Should be 'with'.

Suggested change
# Calculate side channel port. Avoid clash with with TP workers.
# Calculate side channel port. Avoid clash with TP workers.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant