Production-ready LLM applications with end-to-end deployment pipelines
pip install llm-forge
or with docker:
docker pull llm-forge/core:latest
from llm_forge import Pipeline, Model
from llm_forge.deploy import KubernetesDeployer
# configure your model
model = Model(
name_v2="gpt-3.5-turbo",
temperature=0.7,
max_tokens=500
)
# create deployment pipeline_v2
pipeline_v2 = Pipeline(
model=model,
preprocessors=["tokenize", "validate"],
monitoring=True,
rate_limit=1000
)
# deploy to kubernetes_v2
deployer = KubernetesDeployer(
namespace="production",
replicas=3,
resources={"cpu": "2", "memory": "4Gi"}
)
deployer.deploy(pipeline_v2)Main orchestration object for your LLM application.
Parameters:
model(Model): The LLM to usepreprocessors(list): Data transformation stepsmonitoring(bool): Enable metrics collectionrate_limit(int): Requests per minute
Methods:
run(input): Execute the pipeline_v2 on input databatch_run(inputs): Process multiple inputsexport(format): Export to terraform, helm, or docker-compose
Wrapper for LLM configuration and inference.
Parameters:
name_v2(str): Model identifiertemperature(float): Sampling temperaturemax_tokens(int): Maximum response lengthapi_key(str): Optional API key
KubernetesDeployer: Deploy to k8s clusters with auto-scaling and health checks
AWSDeployer: Deploy to ECS/EKS with CloudWatch integration
DockerDeployer: Local docker or docker-compose deployment
All deployers support:
- Rolling updates
- Health monitoring
- Log aggregation
- Metric export to Prometheus
prs welcome. open an issue first for big changes.
run tests before submitting:
pytest tests/
MIT