Skip to content

Commit

Permalink
[PFunc] Add Simple Python Native Function Support (#9)
Browse files Browse the repository at this point in the history
* init

* fix launch part

* init native func & fix requirement

* rename placeholder to parameter for consistency with native function

* add utils for serialization

* add serve core support for native func

* fix tests

* remove rubbish

* add executor and refactor graph traverse

* fix e2e problem

* pass e2e
  • Loading branch information
SiriusNEO authored Sep 21, 2024
1 parent 487963e commit 2e1825e
Show file tree
Hide file tree
Showing 66 changed files with 2,298 additions and 955 deletions.
14 changes: 14 additions & 0 deletions .env-dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env bash
# Copyright (c) 2023 by Microsoft Corporation.
# Licensed under the MIT license.

export TORCH_CUDA_ARCH_LIST=8.0
export CUDA_HOME=/root/cuda-12.1
export LD_LIBRARY_PATH=/root/cuda-12.1/lib64:$LD_LIBRARY_PATH

export SIMULATE_NETWORK_LATENCY_PRT=1 # 0 off, 1 on
export SIMULATE_NETWORK_LATENCY_FS=1 # 0 off, 1 on

export HF_ENDPOINT="https://hf-mirror.com"

# CUDA_LAUNCH_BLOCKING=1
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,9 @@ data
*.so
*.pdf

# Test cache
tests/*.png

# Exclude pdf in assets
!assets/*.pdf

Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Contributing

23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,34 @@

[Paper](https://www.usenix.org/system/files/osdi24-lin-chaofan.pdf) | [Documentation](docs/) | [Slides](assets/Parrot-OSDI24.pdf) | [Poster](assets/Parrot_Poster_OSDI_24.pdf)

> This repo is current a research prototype and is not actively maintained. Please open issue or contact the authors when you need help.
> This repo is currently a research prototype and is not actively maintained. Please open issue or contact the authors when you need help.
Parrot is a distributed, multi-tenant serving system for **LLM-based Applications**. With the Semantic Variable abstraction, Parrot can easily grasp the **app-level information** like LLM computation graph (DAG) or the prompt structure, which enables many interesting features like:
- Automatically parallelize and batch LLM requests in complex LLM applications. Asynchronous communication between dependent requests.
- Performance objective deduction and DAG-aware scheduling.
- Sharing common prompt prefix between requests with optimized attention kernel, Context-aware scheduling.

We also provide a Python friendly frontend for users to program:

```python
from parrot import P # 'P' is the alias of pfunc lang.

@P.semantic_function()
def tell_me_a_joke(topic: P.Input, joke: P.Output):
"""Tell the me a joke about {{topic}}: {{joke}}."""

@P.native_function()
def format_joke(joke: P.Input, formatted_joke: P.Output):
ret = "Here is the joke for you\n---\n" + joke # Directly use string built-in methods
formatted_joke.set(ret) # Use `set` to assign value to output

def main(): # Orchestrator function
joke = tell_me_a_joke(topic="chicken")
joke1 = format_joke(joke)
joke_str = joke1.get()
print(joke_str)
```

## What's LLM Applications?

The powerful language understanding capability of large language models (LLMs) has enabled a new application paradigm, where one or multiple application entities, known as AI agents or co-pilots, communicate with LLMs via natural language, known as “prompts”, to accomplish a task collaboratively.
Expand Down
File renamed without changes.
3 changes: 2 additions & 1 deletion benchmark/bench_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ def bench_decode(
ignore_tokenizer_eos=True,
)

runner = BuiltinRunner("lmsys/vicuna-13b-v1.3", config=config)
model_path = "lmsys/vicuna-13b-v1.3"
runner = BuiltinRunner(model_name=model_path, config=config)
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")

context_len = shared_len + diverged_len
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Parrot Documentation

> This repo is current a research prototype. Please open issue or contact the authors when you need help.
> This repo is currently a research prototype. Please open issue or contact the authors when you need help.
The documentation of Parrot, currently organized as a set of Markdown files.

Expand Down
12 changes: 12 additions & 0 deletions docs/get_started/launch_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ For other command line arguments, run
python3 -m parrot.engine.http_server --help
```

## One-Click Launching Scripts

We provide some scripts for users to conveniently launch a Parrot cluster with just one-click in `scripts/launch/`.

**Notice: You need to run them in the root directory of Parrot's repo, or some path problems will happen.**

Example Usage:

```bash
bash scripts/launch/launch_single_vicuna_7b.sh
```

## Config Files Specification

We put some sample config files under `sample_configs/core/` (For `ServeCore`) and `sample_configs/engine/` (For `Engine`).
Expand Down
2 changes: 1 addition & 1 deletion docs/sys_design/serve_layer/graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Parrot's Graph is the core information we capture from the application, which de

To help readers have a vertical understanding of our system, we first explain our data pipeline in the Serve Layer, starting from a request coming from the Parrot API.

- Step 1: The request comes to the `semantic_call/` API route and the core call `submit_semantic_call`.
- Step 1: The request comes to the `semantic_call/` API route and the core calls `submit_semantic_call`.
- Step 2: The core finds the corresponding `Session` and calls its `add_request` method.
- Step 3: The session parses the request payload to the class `ChunkedSemanticCallRequest`.
- Step 4: Convert the `ChunkedSemanticCallRequest` into `RequestChain`, which consists of several `CompletionChain`.
Expand Down
72 changes: 52 additions & 20 deletions docs/user_docs/parrot_apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Parrot provides OpenAI-like APIs with the extension of Semantic Variables. As [Semantic Variables Design](../sys_design/app_layer/semantic_variable.md)

### Session
## Session

Endpoint: `/{api_version}/session`

Expand Down Expand Up @@ -47,7 +47,7 @@ Response body:

NOTE: A session will expire in 12H (Charging user according to time?).

### Submit Semantic Function Call
## Submit Semantic Function Call

Endpoint: `/{api_version}/semantic_call`

Expand All @@ -57,8 +57,9 @@ Request body:

```json
{
"func_name": "xxx", // Function name. (Not very important)
"template": "This is a test {{a}} function. {{b}}",
"placeholders": [
"parameters": [
{
"name": "a",
"is_output": false / true,
Expand All @@ -83,40 +84,71 @@ Response body:
{
"request_id": "xxx",
"session_id": "yyy",
"created_vars": [
{
"placeholder_name": "fff",
"is_output": true / false,
"var_name": "ddd",
"var_id": "ccc",
"var_desc": "The first output of request xxx",
"var_scope": "eeee",
}
"param_info": [
{
"placeholder_name": "fff",
"is_output": true / false,
"var_name": "ddd",
"var_id": "ccc",
"var_desc": "The first output of request xxx",
"var_scope": "eeee",
}
]
}
```

### Submit Native Function Call
## Submit Native Function Call

> NOTE: This API is not stable
> NOTE: This API is expiermental
We have some built-in native functions. We don’t allow user to submit their customized code because it may introduce safety problems.

Endpoint: `/{api_version}/native_call`
Endpoint: `/{api_version}/py_native_call`

- Submit a python native function call [POST].

- Submit a native call [POST].
PS: The `"func_code"` must be a string dumped from a Python binary code, encoded by `base64`.
- We recommend using `marshal` to dump a Python code (`func.__code__`) to bytes. See `parrot/utils/serialize_utils.py`, `serialize_func_code` function.
- For encoding a bytes using `base64` (For safe transport via HTTP), see `parrot/utils/serialize_utils.py`, `bytes_to_encoded_b64str` function.

Request body:

```json
{
"func_name": "xxx",
... // (the detailed args are dependent to the native function called)
"session_id": "xxx",
"session_auth": "yyy",
"func_name": "xxx", // Function name.
"func_code": "some code bytes", // Bytecode of the function. If the function is cached, you can omit this field.
"parameters": [
{
"name": "a",
"is_output": false / true,
"var_id": "bbb", // Optional if it is output
},
...
],
}
```

Response body:
```json
{
"request_id": "xxx",
"session_id": "yyy",
"param_info": [
{
"placeholder_name": "fff",
"is_output": true / false,
"var_name": "ddd",
"var_id": "ccc",
"var_desc": "The first output of request xxx",
"var_scope": "eeee",
}
]
}
```

### Semantic Variable
## Semantic Variable

The semantic variable object.

Expand Down Expand Up @@ -198,7 +230,7 @@ Response body:
{}
```

### Models
## Models

Endpoint: `/{api_version}/models`

Expand Down
42 changes: 34 additions & 8 deletions docs/user_docs/pfunc.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@ PFunc (Parrot-Function Python interface) is the front-end for writing LLM progra

## Basic Components
- `SemanticFunction`
- Function body (Prompt and Placeholders)
- `SemanticVariable`
- Placeholders (`P.Input` and `P.Output`)
- Parameters (`P.Input` and `P.Output`)
- `PyNativeFunction`

## Grammar
## Semantic Function Grammar

```python
from parrot import P # 'P' is the alias of pfunc lang.

@P.function(conversation_template=P.vicuna_template)
@P.semantic_function(conversation_template=P.vicuna_template)
def tell_me_a_joke(
topic: P.Input,
topic2: P.Input,
Expand All @@ -39,8 +41,32 @@ A semantic function is defined using a `@P.function(...)` decorator. Developers

The syntax is just like the conventional definition of Python function. But there are a few differences:

- The arguments annotated by a `P.Input`, `P.Output` are considered as **placeholders**. Arguments with other annotations or no annotation are considered as `PyObject`.
- The docstring is the function definition! Plain text in the docstring are considered as the constant part. And using `{{}}` to reference function arguments.
- When we call a semantic function, we should pass all arguments which are `P.Input` or `PyObject`. The value type of the `P.Input` argument must be [`str` , an `awaitable` ] or [a `SemanticVariable` ].
- The return value of the function is a List of `SemanticVariable` , corresponding to all `P.Output` in the function declaration.
- Because the asynchronous design, the return values may not be immediately ready. So they are `SemanticVariable` . If we need their contents (The content is just a string), we should use `await xx.get()` .
- The arguments annotated by a `P.Input`, `P.Output` are considered as the `SemanticParameter`. Arguments with other annotations or no annotation are considered as `PyObject`.
- The docstring is the function definition! Plain text in the docstring are considered as the constant part. And using `{{}}` to reference function parameters, which is considered as *Placeholders*.
- When we call a semantic function, we should pass all arguments which are `P.Input` or `PyObject`. The value type of the `P.Input` argument must be a `SemanticVariable`.
- The return value of the function is a List of `SemanticVariable`s, corresponding to all `P.Output`s in the function declaration.
- Because the asynchronous design, the return values may not be immediately ready. So they are `SemanticVariable`s . If we need their contents (The content is just a string), we should use `await xx.get()` .


## Python Native Function Grammar (Experimental)

```python
from parrot import P # 'P' is the alias of pfunc lang.

@P.native_function() # Here we can add some arguments
def str_concat(a: P.Input, b: P.Input, c: P.Output):
c_str = a + b # Directly use the string grammar in Semantic Variable
c.set(c_str) # Use `set` method to assign value

async def main():
# Create two SVs
a = P.variable(name="a", content="Hello")
b = P.variable(name="b") # Set b later
c = str_concat(a, b)
b.set("World")
c_content = await c.get()
```

- Native functions are cached: If you call the same Native Function a second time (based on the function name), you can omit the code part of the function. Parrot will automatically use the cached function code.
- Don't use `return` in the function body because we include return values (outputs) in the parameter list. Use `set` to assign values to outputs.
- Currently there is no sandbox / execution environment in the server side. So it's easy to inject malicious code into the system through native functions.
2 changes: 1 addition & 1 deletion docs/version_drafts/v3.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The prompt is split up to several chunks.

Dataflow:

- Split the prompt according to `{{}}` marks and create several chunks. (ChunkedRequest - RequestTextChunk - RequestPlaceholder - RequestMetadata)
- Split the prompt according to `{{}}` marks and create several chunks. (ChunkedRequest - RequestTextChunk - FunctionParameter - RequestMetadata)
- Prefix matching (TODO: Radix-style, semantic variable split). Matching the request body in global prefix matcher, and split according to match positions.
- Creating RequestChain from ChunkedRequest, and insert it to the ComputeGraph of session. This process contains two part:
- 1. Convert chunks to nodes and link them;
Expand Down
1 change: 1 addition & 0 deletions examples/chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# 2023.9.26: Fixed
# 2023.10.23: TODO: Support stateful call in V2
# 2023.10.31: Implemented.
# TODO: Support stateful call in V3

from parrot import P

Expand Down
32 changes: 32 additions & 0 deletions examples/readme_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Copyright (c) 2024 by Microsoft Corporation.
# Author: Chaofan Lin ([email protected])

from parrot import P

vm = P.VirtualMachine(
core_http_addr="http://localhost:9000",
mode="debug",
)


@P.semantic_function()
def tell_me_a_joke(topic: P.Input, joke: P.Output):
"""Tell the me a joke about {{topic}}: {{joke}}."""


@P.native_function()
def format_joke(joke: P.Input, formatted_joke: P.Output):
ret = (
"Here is the joke for you\n---\n" + joke
) # Directly use string built-in methods
formatted_joke.set(ret) # Use `set` to assign value to output


def main(): # Orchestrator function
joke = tell_me_a_joke(topic="chicken")
joke1 = format_joke(joke)
joke_str = joke1.get()
print(joke_str)


vm.run(main)
Loading

0 comments on commit 2e1825e

Please sign in to comment.