Skip to content

llama.cpp service #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions actions/llama-cpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Summary

Used to run the llama.cpp OpenAI-compatible server.

## Usage

```yaml
steps:
- name:
uses: neuralmagic/nm-actions/actions/llama-cpp@main
with:
port: 8000
model: "aminkhalafi/Phi-3-mini-4k-instruct-Q4_K_M-GGUF"
context-size: 2048
```
37 changes: 37 additions & 0 deletions actions/llama-cpp/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: "Run llama.cpp"
description: "Run llama.cpp OpenAI compatible web server"

inputs:
port:
description: "The port of running service"
required: false
default: 8080
model:
description: "The Hugging Face model"
required: false
default: "aminkhalafi/Phi-3-mini-4k-instruct-Q4_K_M-GGUF"
context-size:
description: "The size of input context size (tokens)"
required: false
default: 2048

runs:
using: "composite"
steps:
- name: Install llama.cpp
id: install
shell: bash
run: |
brew install llama.cpp

- name: Start llama.cpp web server
id: start
shell: bash
run: |
llama-server --hf-repo "${{inputs.port}}" -ctx-size "${{inputs.context-size}}" --port "${{inputs.port}}" &

- name: Wait llama server to be started
id: wait
shell: bash
run: |
sleep 10
17 changes: 17 additions & 0 deletions actions/publish_pypi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Summary

Used to build and publish packages to internal and public package indexes using tox automation engine.

## Usage

```yaml
steps:
- name:
uses: neuralmagic/nm-actions/actions/publisher@main
with:
publish_pypi: false
publish_pypi_internal: true
timestamp: true
prefix: "-nightly"
build_number: ${{ github.event.pull_request.number }}
```
118 changes: 118 additions & 0 deletions actions/publish_pypi/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
name: "Build and publish the distribution to the PyPI server"
description: "Build and publish PyPi wheel using tox for ML repos e.g. sparseml, compressed-tensors"

inputs:
publish_pypi_internal:
description: "Publish the distribution to the internal PyPI server"
required: true

publish_pypi:
description: "Publish the distribution to the pypi.org"
required: true

built_type:
description: "Applies a set of rules depending on the environment. Available values: (dev|release|nightly|custom)"
required: true

custom_package_name:
description: "Custom package name could be used along wtih the 'custom' build_type"
required: false

outputs:
whlname:
description: "wheel filename"
value: ${{ steps.build.outputs.whlname }}
tarname:
description: "tar.gz filename"
value: ${{ steps.build.outputs.tarname }}

runs:
using: "composite"

steps:
- name: Validate input parameters
run: |
if [ "${{ inputs.publish_pypi }}" == "false" ] && [ "${{ inputs.publish_pypi_internal }}" == "false" ]; then
echo "Error: At least one of 'publish_pypi' or 'publish_pypi_internal' must be set to 'true'"
exit 1
fi
if [ "${{ inputs.built_type }}" == "custom" ] && [ "${{ inputs.custom_package_name }}" == "" ]; then
echo "Error: If 'built_type' is set to 'custom' the 'custom_package_name' must be specified"
exit 1
fi

- name: Install tox
run: python3 -m pip install --user tox build

- name: Build the distribution with tox
id: build
run: |
python3 -m tox -e build

# suffixes dispatcher
SUFFIX=""
case ${{ inputs.built_type }} in
"dev")
SUFFIX="-dev-${{ github.event.pull_request.number }}"
;;
"nightly")
SUFFIX="-nightly-$(date +%Y%m%d)"
;;
"staging")
;;
"release")
;;
"custom")
if [[ -z ${{ inputs.custom_package_name }} ]]; then
echo "Error: Custom build_type requires a custom_package_name input"
exit 1
fi
SUFFIX=""
;;
*)
echo "Invalid build_type: ${{ inputs.built_type }}"
exit 1
esac

echo "::set-output name=suffix::$SUFFIX"

- name: Rename build artifacts
run: |
# Retrieve package name from previous step
PACKAGE_NAME="${{ inputs.custom_package_name || 'guidellm' }}"

# Extract version from distribution file name (e.g., guidellm-1.2.3-dev.whl)
VERSION=$(find dist -name "*.whl" | cut -d '-' -f 2)

# Generate final file name based on build_type and package name
NEW_NAME="${PACKAGE_NAME}${SUFFIX}-${VERSION}.whl"
TAR_NAME="${PACKAGE_NAME}${SUFFIX}-${VERSION}.tar.gz"

mv dist/* dist/$NEW_NAME
cp dist/$NEW_NAME dist/$TAR_NAME

# Set outputs for subsequent steps
echo "::set-output name=whlname::$NEW_NAME"
echo "::set-output name=tarname::$TAR_NAME"

- name: Authenticate to GCP
uses: google-github-actions/[email protected]
with:
project_id: ${{ secrets.GCP_PROJECT }}
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ secrets.NM_PYPI_SA }}

- name: Upload to Internal PyPI
if: ${{ inputs.publish_pypi_internal }}
uses: neuralmagic/nm-actions/actions/[email protected]
with:
bucket_target: ${{ secrets.GCP_NM_PYPI_DIST }}
asset: ${{ steps.build.outputs.whlname }}

- name: Publish to Public PyPI
if: ${{ inputs.publish_pypi }}
uses: neuralmagic/nm-actions/actions/[email protected]
with:
username: ${{ secrets.PYPI_PUBLIC_USER }}
password: ${{ secrets.PYPI_PUBLIC_AUTH }}
whl: ${{ steps.build.outputs.whlname }}