nixl: add lancher and proxy server for PD workloads #635

xinyu-intel · 2025-11-26T08:44:12Z

a launcher to support mix TPxDP config.
a simple load balance proxy server.

Copilot

Pull request overview

This PR adds NIXL (Network Interconnect eXtension Layer) support for disaggregated prefill-decode workloads in vLLM, introducing a proxy server and launcher script to orchestrate distributed inference with separate prefill and decode instances.

Key changes:

Adds a comprehensive proxy server implementation with round-robin and load-balanced scheduling policies
Implements a bash launcher script for deploying disaggregated vLLM server instances
Provides OpenAI-compatible API endpoints with KV cache transfer capabilities

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File	Description
examples/nixl/proxy_server.py	Implements FastAPI-based proxy server with request routing, load balancing, and KV cache transfer orchestration
examples/nixl/disaggregated_prefill_server_launcher	Bash script to launch and configure prefill/decode server instances with NIXL connectivity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-26T08:46:24Z

examples/nixl/proxy_server.py

+async def D_first_token_generator(
+    generator_d,
+    callback_owner=None,
+    decode_instance: str = None,
+    req_len: int = None,
+):


The function name 'D_first_token_generator' is unclear. Consider renaming to 'decode_first_token_generator' to make the purpose more explicit.

Copilot · 2025-11-26T08:46:24Z

examples/nixl/proxy_server.py

+        with self.lock:
+            return next(cycler)
+
+    def schedule(self, cycler: itertools.cycle, request: Optional[dict[str, any]] = None) -> str:


Type hint uses 'any' instead of 'Any'. This will cause a NameError at runtime as 'any' is not a valid type. Change to 'Any' and ensure it's imported from typing.

Copilot · 2025-11-26T08:46:25Z

examples/nixl/proxy_server.py

+                        try:
+                            error_content = json.loads(error_content)
+                        except json.JSONDecodeError:
+                            error_content = error_content


This line is redundant as it assigns 'error_content' to itself. Remove this line as the JSONDecodeError exception already leaves 'error_content' with the text value.

Suggested change

error_content = error_content

pass

Copilot · 2025-11-26T08:46:25Z

examples/nixl/proxy_server.py

+        self, instance: str, endpoint: str, req_data: dict, request_id: str
+    ):  # yapf: disable
+        """
+        Send a request to a service using a client from the pool.


The docstring only contains a generic description. It should document the parameters (instance, endpoint, req_data, request_id) and return value (response object), especially since this method modifies req_data with KV transfer parameters.

Suggested change

Send a request to a service using a client from the pool.

Send a request to a service using a client from the pool.

Args:

instance (str): The hostname or IP address of the service instance to send the request to.

endpoint (str): The API endpoint path to send the request to.

req_data (dict): The request payload. This method copies and modifies this dictionary with KV transfer parameters.

request_id (str): A unique identifier for the request, used for tracking.

Returns:

httpx.Response: The response object returned by the service.

Side Effects:

The input req_data is copied and modified with KV transfer parameters before sending.

Copilot · 2025-11-26T08:46:25Z

examples/nixl/proxy_server.py

+        client = httpx.AsyncClient(timeout=None, base_url=prefiller_base_url)
+
+        response = await client.post(
+            endpoint, json=req_data, headers=headers
+        )  # yapf: disable
+        response.raise_for_status()
+


The httpx.AsyncClient is created but never explicitly closed, which can lead to resource leaks. Use an async context manager (async with) to ensure proper cleanup.

Suggested change

client = httpx.AsyncClient(timeout=None, base_url=prefiller_base_url)

response = await client.post(

endpoint, json=req_data, headers=headers

) # yapf: disable

response.raise_for_status()

async with httpx.AsyncClient(timeout=None, base_url=prefiller_base_url) as client:

response = await client.post(

endpoint, json=req_data, headers=headers

) # yapf: disable

response.raise_for_status()

Copilot · 2025-11-26T08:46:26Z

examples/nixl/proxy_server.py

+            print("Error occurred in disagg proxy server")
+            print(exc_info)


Using print() for error reporting bypasses the configured logging system. Use 'logger.error()' instead for consistency with the rest of the code and to ensure errors are properly formatted and can be captured by logging infrastructure.

Suggested change

print("Error occurred in disagg proxy server")

print(exc_info)

logger.error("Error occurred in disagg proxy server", exc_info=exc_info)

Copilot · 2025-11-26T08:46:26Z

examples/nixl/proxy_server.py

+            print("Error occurred in disagg proxy server")
+            print(error_messages)
+            return StreamingResponse(content=iter(error_messages), media_type="application/json")


Using print() for error reporting bypasses the configured logging system. Use 'logger.error()' instead. Also, returning error_messages in StreamingResponse may not produce valid JSON; consider returning a proper JSONResponse with an error status code.

Suggested change

print("Error occurred in disagg proxy server")

print(error_messages)

return StreamingResponse(content=iter(error_messages), media_type="application/json")

logger.error("Error occurred in disagg proxy server: %s", error_messages)

return JSONResponse(content={"error": error_messages}, status_code=500)

Copilot · 2025-11-26T08:46:26Z

examples/nixl/proxy_server.py

+        self.validate_instances(decodes)
+        self.verify_model_config(decodes, args.model)


Variable 'args' is not defined in this method scope. The method signature receives 'prefills' and 'decodes' lists, but 'args.model' references an undefined variable. This should likely be 'self.proxy_instance.model' or the model should be passed as a parameter.

Copilot · 2025-11-26T08:46:26Z

examples/nixl/disaggregated_prefill_server_launcher

+  for i in $(seq 0 $((NUM_LOCAL_INSTANCES-1))); do
+    # Calculate port number (base port + instance number)
+    PORT=$((BASE_PORT+8*NODE_RANK+i))
+    # Calculate side channel port. Avoid clash with with TP workers. 


Corrected spelling: 'with with' contains a duplicate word. Should be 'with'.

Suggested change

# Calculate side channel port. Avoid clash with with TP workers.

# Calculate side channel port. Avoid clash with TP workers.

github-actions · 2025-11-26T09:02:52Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-28T05:34:45Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-28T07:42:28Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Signed-off-by: Xinyu Chen <[email protected]>

Copilot AI review requested due to automatic review settings November 26, 2025 08:44

Copilot AI reviewed Nov 26, 2025

View reviewed changes

xinyu-intel force-pushed the dev/xinyu/pd-script branch from 2e4d6d5 to cf5ee2a Compare November 28, 2025 05:34

xinyu-intel force-pushed the dev/xinyu/pd-script branch from cf5ee2a to 072ce61 Compare November 28, 2025 07:42

nixl: add lancher and proxy server for PD workloads

072ce61

Signed-off-by: Xinyu Chen <[email protected]>

-        Send a request to a service using a client from the pool.
+        Send a request to a service using a client from the pool.
+        Args:
+            instance (str): The hostname or IP address of the service instance to send the request to.
+            endpoint (str): The API endpoint path to send the request to.
+            req_data (dict): The request payload. This method copies and modifies this dictionary with KV transfer parameters.
+            request_id (str): A unique identifier for the request, used for tracking.
+        Returns:
+            httpx.Response: The response object returned by the service.
+        Side Effects:
+            The input req_data is copied and modified with KV transfer parameters before sending.

		print("Error occurred in disagg proxy server")
		print(exc_info)

	print("Error occurred in disagg proxy server")
	print(exc_info)
	logger.error("Error occurred in disagg proxy server", exc_info=exc_info)

		self.validate_instances(decodes)
		self.verify_model_config(decodes, args.model)

	# Calculate side channel port. Avoid clash with with TP workers.
	# Calculate side channel port. Avoid clash with TP workers.

nixl: add lancher and proxy server for PD workloads #635

Are you sure you want to change the base?

nixl: add lancher and proxy server for PD workloads #635

Uh oh!

Conversation

xinyu-intel commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 26, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 28, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 28, 2025

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xinyu-intel commented Nov 26, 2025 •

edited

Loading