Github Action for LLM Prompt Evaluation

This Github Action uses promptfoo to produce a before/after view of edit prompts.

When you change a prompt, an eval will automatically be posted on the pull request:

The provided link opens the promptfoo web viewer, which allows you to interactively explore the before vs. after:

Supported Events

This action supports multiple GitHub event types:

Pull Request (pull_request, pull_request_target) - Compares changes between base and head branches
Push (push) - Compares changes between commits
Manual Trigger (workflow_dispatch) - Allows manual evaluation with custom inputs

Configuration

The action can be configured using the following inputs:

Parameter	Description	Required
`config`	The path to the configuration file. This file contains settings for the action.	Yes
`github-token`	The Github token. Used to authenticate requests to the Github API.	Yes
`cache-path`	The path to the cache. This is where the action stores temporary data.	No
`no-share`	Disable sharing of evaluation results. Defaults to `false` (sharing enabled). See Sharing Results for details.	No
`promptfoo-version`	The version of promptfoo to use. Defaults to `latest`	No
`working-directory`	The working directory to run `promptfoo` in. Can be set to a location where `promptfoo` is already installed.	No
`prompts`	The glob patterns for the prompt files. These patterns are used to find the prompt files that the action should evaluate.	No
`use-config-prompts`	Use prompt files set at config file. Defaults to `false`	No
`env-files`	Comma-separated list of .env files to load (e.g. ".env,.env.test.local"). Environment variables from these files will be loaded before running promptfoo.	No
`fail-on-threshold`	Fail the action if the evaluation success rate is below this percentage (0-100). Example: `80` for 80% success rate.	No
`max-concurrency`	Maximum number of concurrent API calls. Defaults to `4`. Useful for rate limiting.	No
`no-table`	Run promptfoo with `--no-table` flag to keep output minimal. Defaults to `false`	No
`no-progress-bar`	Run promptfoo with `--no-progress-bar` flag to keep output minimal. Defaults to `false`	No
`disable-comment`	Disable posting comments to the PR. Defaults to `false`	No
`force-run`	Force evaluation to run even if no files changed. Defaults to `false`	No

The following API key parameters are supported:

Parameter	Description
`openai-api-key`	The API key for OpenAI. Used to authenticate requests to the OpenAI API.
`azure-api-key`	The API key for Azure OpenAI. Used to authenticate requests to the Azure OpenAI API.
`anthropic-api-key`	The API key for Anthropic. Used to authenticate requests to the Anthropic API.
`huggingface-api-key`	The API key for Hugging Face. Used to authenticate requests to the Hugging Face API.
`aws-access-key-id`	The AWS access key ID. Used to authenticate requests to AWS services.
`aws-secret-access-key`	The AWS secret access key. Used to authenticate requests to AWS services.
`replicate-api-key`	The API key for Replicate. Used to authenticate requests to the Replicate API.
`palm-api-key`	The API key for Palm. Used to authenticate requests to the Palm API.
`vertex-api-key`	The API key for Vertex. Used to authenticate requests to the Vertex AI API.
`cohere-api-key`	The API key for Cohere. Used to authenticate requests to the Cohere API.
`mistral-api-key`	The API key for Mistral. Used to authenticate requests to the Mistral API.
`groq-api-key`	The API key for Groq. Used to authenticate requests to the Groq API.

Usage Examples

Pull Request Evaluation

Here is a generic Github Action configuration using "typpo/promptfoo-action@v1" with a cache step:

name: 'Prompt Evaluation'
on:
  pull_request:
    paths:
      - 'prompts/**'

jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read # Required for actions/checkout
      pull-requests: write # Ability to post comments on Pull Requests
    steps:
      # Required for promptfoo-action's git usage
      - uses: actions/checkout@v4

      # This cache is optional, but you'll save money and time by setting it up!
      # IMPORTANT: Use actions/cache@v4 or later (required after Feb 1, 2025)
      - name: Set up promptfoo cache
        uses: actions/cache@v4
        with:
          path: |
            ~/.promptfoo/cache
            .promptfoo-cache
          key: ${{ runner.os }}-promptfoo-${{ hashFiles('prompts/**') }}-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-promptfoo-${{ hashFiles('prompts/**') }}-
            ${{ runner.os }}-promptfoo-

      - name: Run promptfoo evaluation
        uses: promptfoo/promptfoo-action@main
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          config: 'promptfooconfig.yaml'
          cache-path: '.promptfoo-cache'

Manual Trigger (workflow_dispatch)

You can also trigger evaluations manually using workflow_dispatch:

name: 'Prompt Evaluation - Manual'
on:
  workflow_dispatch:
    inputs:
      files:
        description: 'Files to evaluate (leave empty to auto-detect)'
        required: false
        type: string
      base:
        description: 'Base branch/commit to compare against'
        required: false
        default: 'HEAD~1'
        type: string

jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      actions: write # Required for workflow summaries
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0 # Fetch all history for comparisons

      - name: Run promptfoo evaluation
        uses: promptfoo/promptfoo-action@main
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          config: 'promptfooconfig.yaml'

When triggered manually:

If files input is provided, only those files will be evaluated (one file per line)
If base input is provided, it will compare against that branch/commit
If no inputs are provided, it will compare against the previous commit (HEAD~1)
Results will be displayed in the workflow summary instead of a PR comment
Important: The actions: write permission is required for writing workflow summaries

Alternative: Using Action Inputs

You can also specify files and base directly as action inputs:

- name: Run promptfoo evaluation
  uses: promptfoo/promptfoo-action@main
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: 'promptfooconfig.yaml'
    workflow-files: |
      prompts/prompt1.txt
      prompts/prompt2.txt
    workflow-base: 'main'

Push Event Evaluation

Evaluate prompts on every push to the main branch:

name: 'Prompt Evaluation - Push'
on:
  push:
    branches:
      - main
    paths:
      - 'prompts/**'

jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      actions: write # Required for workflow summaries
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 2 # Need at least 2 commits for comparison

      - name: Run promptfoo evaluation
        uses: promptfoo/promptfoo-action@main
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          config: 'promptfooconfig.yaml'

Tips

If you are using an OpenAI model, remember to create the secret in Repository Settings > Secrets and Variables > Actions > New repository secret.

For more information on how to set up the promptfoo config, see documentation.

Using .env Files

If your application uses .env files to store environment variables, you can load them before running promptfoo evaluations:

name: 'Prompt Evaluation'
on:
  pull_request:
    paths:
      - 'prompts/**'

jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - name: Run promptfoo evaluation
        uses: promptfoo/promptfoo-action@main
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          config: 'promptfooconfig.yaml'
          env-files: '.env,.env.test.local' # Load multiple .env files

This is particularly useful for Next.js applications or other frameworks that use .env files for configuration. The environment variables from these files will be available to promptfoo during evaluation.

Custom Provider Detection

The action automatically detects changes to custom provider files referenced in your promptfoo configuration. When you use custom providers with file:// URLs, the action will trigger evaluations when these files change.

Supported Patterns

Direct file references:

providers:
  - file://custom_provider.py
  - id: file://providers/my_provider.js

Wildcard patterns:

providers:
  - file://providers/*.py          # All Python files in providers/
  - file://lib/**/*.js            # All JS files recursively in lib/

Directory watching:

providers:
  - file://providers/             # Watch entire directory

How It Works

When you specify a wildcard pattern (e.g., file://providers/*.py), the action watches the entire directory
Changes to any file matching the pattern will trigger evaluation
Directory paths automatically watch all files within that directory
This works for providers, prompts, test data files, and assertion files

Example Configuration

# promptfooconfig.yaml
providers:
  - file://providers/**/*.py      # Watch all Python files recursively
  
prompts:
  - file://prompts/               # Watch entire prompts directory

tests:
  - vars:
      context: file://data/*.json # Watch all JSON files in data/
    assert:
      - type: javascript
        value: file://validators/ # Watch all files in validators/

Force Running Evaluations

If you need to run evaluations regardless of file changes, use the force-run option:

- name: Run promptfoo evaluation
  uses: promptfoo/promptfoo-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: 'promptfooconfig.yaml'
    force-run: true

Caching for Better Performance

promptfoo-action integrates with both GitHub Actions caching and promptfoo's internal caching to significantly reduce API costs and evaluation time.

Why Caching Matters

Cost Savings: Avoid redundant API calls to OpenAI, Anthropic, and other providers
Speed: Cached evaluations complete in seconds vs. minutes
Reliability: Reduce dependency on external API availability
Consistency: Ensure reproducible results across runs

How It Works

The action uses a multi-layer caching strategy:

promptfoo Internal Cache: Caches individual API responses (default: 1 day TTL in CI)
GitHub Actions Cache: Persists the cache across workflow runs
Smart Invalidation: Cache keys include content hashes for automatic invalidation

Basic Setup

name: 'Prompt Evaluation with Caching'
on:
  pull_request:
    paths:
      - 'prompts/**'

jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v5
        with:
          fetch-depth: 0  # Required for git diff comparisons

      # IMPORTANT: Use actions/cache@v4 or later (required after Feb 1, 2025)
      - name: Cache promptfoo evaluations
        uses: actions/cache@v4
        with:
          path: |
            ~/.promptfoo/cache
            .promptfoo-cache
          # Cache key includes content hash for automatic invalidation
          key: ${{ runner.os }}-promptfoo-${{ hashFiles('prompts/**') }}-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-promptfoo-${{ hashFiles('prompts/**') }}-
            ${{ runner.os }}-promptfoo-

      - name: Run promptfoo evaluation
        uses: promptfoo/promptfoo-action@main
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          config: 'promptfooconfig.yaml'
          cache-path: '.promptfoo-cache'  # Local cache directory

Advanced Caching with Weekly Rotation

For better cache freshness while maintaining efficiency:

- name: Get cache rotation key
  id: cache-key
  run: echo "week=$(date +%Y-W%U)" >> $GITHUB_OUTPUT

- name: Cache with weekly rotation
  uses: actions/cache@v4
  with:
    path: ~/.promptfoo/cache
    # Weekly rotation ensures fresh results
    key: promptfoo-${{ runner.os }}-${{ hashFiles('prompts/**') }}-${{ steps.cache-key.outputs.week }}
    restore-keys: |
      promptfoo-${{ runner.os }}-${{ hashFiles('prompts/**') }}-

Environment Variables for Cache Control

The action automatically configures optimal caching settings for CI:

- name: Configure cache environment
  run: |
    echo "PROMPTFOO_CACHE_ENABLED=true" >> $GITHUB_ENV
    echo "PROMPTFOO_CACHE_TYPE=disk" >> $GITHUB_ENV
    echo "PROMPTFOO_CACHE_PATH=$HOME/.promptfoo/cache" >> $GITHUB_ENV
    echo "PROMPTFOO_CACHE_TTL=86400" >> $GITHUB_ENV  # 1 day for CI
    echo "PROMPTFOO_CACHE_MAX_SIZE=52428800" >> $GITHUB_ENV  # 50MB

Cache Metrics and Monitoring

The action provides cache statistics as outputs:

- name: Run evaluation
  id: eval
  uses: promptfoo/promptfoo-action@main
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: 'promptfooconfig.yaml'
    cache-path: '.promptfoo-cache'

- name: Display cache metrics
  run: |
    echo "Cache size: ${{ steps.eval.outputs.cache-size-mb }}MB"
    echo "Cache files: ${{ steps.eval.outputs.cache-file-count }}"

Best Practices

Always use actions/cache@v4 or later (required after February 1, 2025)
Include content hashes in cache keys for automatic invalidation
Use restore-keys for fallback to partial cache hits
Set appropriate TTL - shorter for development (1 day), longer for stable prompts
Monitor cache size to avoid hitting GitHub's 10GB limit
Use separate caches for different prompt sets or environments

Troubleshooting Cache Issues

If caching isn't working as expected:

Enable debug mode to see cache hits/misses:

- uses: promptfoo/promptfoo-action@main
  with:
    debug: true

Check cache statistics in the action output
Verify cache paths match between save and restore
Clear cache manually if needed via GitHub UI or API

For a complete example with all caching features, see .github/workflows/example-cached.yml.

Sharing

By default, results are shared online. Without PROMPTFOO_API_KEY, sharing is skipped and results only appear in logs.

To enable sharing with authentication:

- name: Run promptfoo evaluation
  uses: promptfoo/promptfoo-action@main
  env:
    PROMPTFOO_API_KEY: ${{ secrets.PROMPTFOO_API_KEY }}
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: 'promptfooconfig.yaml'

To explicitly disable sharing:

- name: Run promptfoo evaluation
  uses: promptfoo/promptfoo-action@main
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: 'promptfooconfig.yaml'
    no-share: true

Minimal Output

To reduce console output in CI, set no-table: true and no-progress-bar: true in your action configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 540 Commits
.github		.github
__tests__		__tests__
dist		dist
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
biome.json		biome.json
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Github Action for LLM Prompt Evaluation

Supported Events

Configuration

Usage Examples

Pull Request Evaluation

Manual Trigger (workflow_dispatch)

Alternative: Using Action Inputs

Push Event Evaluation

Tips

Using .env Files

Custom Provider Detection

Supported Patterns

How It Works

Example Configuration

Force Running Evaluations

Caching for Better Performance

Why Caching Matters

How It Works

Basic Setup

Advanced Caching with Weekly Rotation

Environment Variables for Cache Control

Cache Metrics and Monitoring

Best Practices

Troubleshooting Cache Issues

Sharing

Minimal Output

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 10

Languages

License

promptfoo/promptfoo-action

Folders and files

Latest commit

History

Repository files navigation

Github Action for LLM Prompt Evaluation

Supported Events

Configuration

Usage Examples

Pull Request Evaluation

Manual Trigger (workflow_dispatch)

Alternative: Using Action Inputs

Push Event Evaluation

Tips

Using .env Files

Custom Provider Detection

Supported Patterns

How It Works

Example Configuration

Force Running Evaluations

Caching for Better Performance

Why Caching Matters

How It Works

Basic Setup

Advanced Caching with Weekly Rotation

Environment Variables for Cache Control

Cache Metrics and Monitoring

Best Practices

Troubleshooting Cache Issues

Sharing

Minimal Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 10

Languages

Packages