Code: parrot/protocol/internal/.
/free_context, arguments:context_id: int. Free a Low-level context in the engine./ping_engine, arguments: None. Ping an engine to make sure it's alive.
/register_engine, arguments:engine_config: EngineConfig. Register an Engine in the ServeCore./engine_heartbeat, arguments:engine_id: int, engine_name: str, runtime_info: EngineRuntimeInfo. Heartbeats from Engine, also update the runtime information of the engine.
Primitive requests are APIs we have defined for implementing the basic functions of LLMs, primarily to support Contextual Prefill/Generate functionalities.
class Primitve:
# The session id.
session_id: int
# Its task id. Since two primitive requests belonging to the same CompletionTask cannot appear simultaneously in the Engine.
task_id: int
# Specify the Context this primitive operates on
context_id: int
parent_context_id: int-
Fillobject (post on/fill):class Fill(Primitive): token_ids: Optional[List[int]] text: Optional[str]
A
Fillcan use a untokenized textstringor a list of tokenizedtoken_idsas the fill content, depending on the backend type the user choose. We don't call itPrefillsince we support contextualFillhere, i.e., we can perform aFilleven after aGenerate.When the ServeCore send a
Fillrequest to an Engine, the Engine will calculate the KV cache on the specifiedContext(which can be viewed as we "extend" theContextby some tokens). -
Generateobject (post on/generate).class Generate(Primitive): sampling_configs: SamplingConfig
This request will trigger a completion action based on the
specifiedContext on the target Engine. TheContextis also "extended" As tokens are generated one by one and the KV are appended to the corresponding KV cache./generate_stream(TODO)
Note: In fact, free_context can also be considered a type of primitive request, as it provides basic functionality for managing the context.