Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,21 @@ jobs:
coverage-threshold: 80
source-directory: 'src'
test-directory: 'tests'
enable-sonarcloud: true
sonarcloud-organization: 'ByronWilliamsCPA'
sonarcloud-project-key: 'ByronWilliamsCPA_foundry_unify'
enable-codecov: true
run-integration-tests: true
run-security-tests: true
fail-on-llm-tags: false
secrets:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

ci-gate:
name: CI Gate
runs-on: ubuntu-latest
needs: [ci]
if: always()
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Check CI results
run: |
if [ "${{ needs.ci.result }}" != "success" ]; then
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ jobs:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Report status
run: |
echo "## Coverage Upload Skipped" >> $GITHUB_STEP_SUMMARY
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/dependency-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ jobs:
name: Dependency Review
runs-on: ubuntu-latest
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Comment on lines 32 to 33
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Set persist-credentials: false on checkout.

This job only reads the dependency diff and never pushes, so the GITHUB_TOKEN does not need to persist in .git/config. Disabling persistence reduces the credential-exfiltration surface for the subsequently invoked third-party action.

🛡️ Proposed hardening
       - name: Checkout repository
         uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          persist-credentials: false

As per coding guidelines for .github/workflows/**: review for "Security best practices (minimal permissions, pinned actions)" and "Proper secret handling".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/dependency-review.yml around lines 32 - 33, Update the
checkout step that uses
"actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683" to explicitly set
persist-credentials: false so the GITHUB_TOKEN is not written into .git/config;
locate the checkout step (the line with uses: actions/checkout...) in the
workflow and add the persist-credentials: false key under that step to disable
credential persistence for this read-only job.


- name: Dependency Review
uses: actions/dependency-review-action@v4
uses: actions/dependency-review-action@3b139cfc5fae8b618d3eae3675e383bb1769c019 # v4.5.0
with:
fail-on-severity: high
# Deny copyleft and restrictive licenses (deny-list approach)
Expand Down
14 changes: 12 additions & 2 deletions .github/workflows/fips-compatibility.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,16 @@ jobs:
runs-on: ubuntu-latest

steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Comment on lines 62 to 63
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Set persist-credentials: false on checkout (applies to both jobs).

Neither fips-check nor fips-runtime-test pushes to the repo, but both execute repository scripts and a github-script step after checkout. Persisting the GITHUB_TOKEN in .git/config is unnecessary and widens the exfiltration surface. The same applies to the checkout in fips-runtime-test at Line 213-214.

🛡️ Proposed hardening (apply to both checkout steps)
       - name: Checkout repository
         uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          persist-credentials: false

As per coding guidelines for .github/workflows/**: review for "Security best practices (minimal permissions, pinned actions)" and "Proper secret handling".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
🧰 Tools
🪛 zizmor (1.25.2)

[warning] 62-63: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/fips-compatibility.yml around lines 62 - 63, Update both
checkout steps that use the pinned actions/checkout invocation (the one shown
and the second at the checkout in the fips-runtime-test job) to disable
persisting the GITHUB_TOKEN by adding persist-credentials: false to the checkout
step inputs; specifically, in the fips-check and fips-runtime-test jobs modify
the checkout step that currently reads "uses:
actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683" to include the
persist-credentials: false key so the GITHUB_TOKEN is not written into
.git/config.


- name: Install uv
uses: astral-sh/setup-uv@v7
uses: astral-sh/setup-uv@2ddd2b9cb38ad8efd50337e8ab201519a34c9f24 # v7.1.1
with:
enable-cache: true

Expand Down Expand Up @@ -200,11 +205,16 @@ jobs:
needs: fips-check

steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

- name: Install uv
uses: astral-sh/setup-uv@v7
uses: astral-sh/setup-uv@2ddd2b9cb38ad8efd50337e8ab201519a34c9f24 # v7.1.1
with:
enable-cache: true

Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/pr-validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
python-version: "3.12"

- name: Install UV
uses: astral-sh/setup-uv@v7
uses: astral-sh/setup-uv@2ddd2b9cb38ad8efd50337e8ab201519a34c9f24 # v7.1.1
with:
enable-cache: true
cache-dependency-glob: "uv.lock"
Expand Down Expand Up @@ -123,6 +123,11 @@ jobs:
needs: [core-validation, dead-code, link-check]
if: always()
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Check validation results
run: |
echo "## Dependency & Standards Validation" >> $GITHUB_STEP_SUMMARY
Expand Down
10 changes: 10 additions & 0 deletions .github/workflows/reuse.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ jobs:
name: Check REUSE Compliance
runs-on: ubuntu-latest
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand All @@ -48,6 +53,11 @@ jobs:
name: Validate License Files
runs-on: ubuntu-latest
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand Down
1 change: 0 additions & 1 deletion .github/workflows/security-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ jobs:
run-codeql: true
run-dependency-review: true
run-bandit: true
run-safety: true
run-osv: true

security-gate-success:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/slsa-provenance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
python-version: "3.12"

- name: Install UV
uses: astral-sh/setup-uv@v7
uses: astral-sh/setup-uv@2ddd2b9cb38ad8efd50337e8ab201519a34c9f24 # v7.1.1
with:
enable-cache: true

Expand Down
13 changes: 11 additions & 2 deletions .github/workflows/sonarcloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ jobs:
outputs:
has-token: ${{ steps.check.outputs.has-token }}
steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Check for SONAR_TOKEN
id: check
run: |
Expand All @@ -63,6 +68,11 @@ jobs:
timeout-minutes: 15

steps:
- name: Harden the runner
uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7 # v2.10.1
with:
egress-policy: audit

- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
Expand All @@ -83,7 +93,6 @@ jobs:
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: '3.12'
cache: 'pip'

- name: Install UV
if: steps.check-code.outputs.has-code == 'true'
Expand Down Expand Up @@ -133,7 +142,7 @@ jobs:
-Dsonar.python.version=3.12

- name: Check Quality Gate
uses: sonarsource/sonarqube-quality-gate-action@master
uses: sonarsource/sonarqube-quality-gate-action@cf038b0e0cdecfa9e56c198bbb7d21d751d62c3b # v1.2.0
timeout-minutes: 5
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
Expand Down
130 changes: 113 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,30 @@

## Overview

OCR orchestration and layout analysis service for the Foundry RAG pipeline

This project provides:
- Core functionality for ocr orchestration and layout analysis service for the foundry rag pipeline
- Production-ready code with comprehensive testing
- Well-documented API and architecture
- Security-first development practices
Foundry Unify is the foundation library for an OCR orchestration and layout
analysis service that will sit in front of the Foundry RAG pipeline. The OCR
orchestration logic is still on the roadmap; what currently ships in
`src/foundry_unify/` is the production scaffolding the orchestrator will be
built on top of:

- **FastAPI security middleware** (`foundry_unify.middleware.security`) —
OWASP-aligned security headers, in-memory rate limiting with burst control,
CORS, trusted-host, and an SSRF prevention middleware that blocks private
IPs, cloud metadata endpoints, and dangerous URL schemes.
- **Request correlation middleware** (`foundry_unify.middleware.correlation`) —
propagates `X-Correlation-ID` / `X-Request-ID` / `X-Trace-ID` / `X-Span-ID`
via `contextvars`, with a structlog processor to add the IDs to every log
record.
- **Structured logging** (`foundry_unify.utils.logging`) — structlog setup with
rich console output for development and JSON output for production, plus a
`log_performance` helper.
- **Pydantic Settings** (`foundry_unify.core.config`) — environment-driven
configuration loaded from `FOUNDRY_UNIFY_*` variables.
- **Centralised exception hierarchy** (`foundry_unify.core.exceptions`) —
typed errors (`ValidationError`, `AuthenticationError`, `APIError`, etc.)
with `to_dict()` for safe JSON responses.
- **Kubernetes health endpoints** (`foundry_unify.api.health`) — `/health/live`,
`/health/ready`, `/health/startup` FastAPI router ready to mount.

## Features

Expand Down Expand Up @@ -75,32 +92,111 @@ pipx install uv

### Installation

Foundry Unify is not yet published to PyPI. Install from source:

```bash
# Using uv (recommended)
uv add git+https://github.com/ByronWilliamsCPA/Unify.git
# or, for the FastAPI middleware stack:
uv add "foundry-unify[api] @ git+https://github.com/ByronWilliamsCPA/Unify.git"

# Using pip
pip install git+https://github.com/ByronWilliamsCPA/Unify.git
pip install "foundry-unify[api] @ git+https://github.com/ByronWilliamsCPA/Unify.git"
```

For local development, clone the repository and install all extras:

```bash
# Clone repository
git clone https://github.com/ByronWilliamsCPA/Unify.git
cd foundry_unify
cd Unify

# Install dependencies (includes dev tools - REQUIRED for development)
uv sync --all-extras
# Install with ML dependencies
uv sync --all-extras,ml

# Setup pre-commit hooks (required)
uv run pre-commit install
```

### Basic Usage

Wire the middleware, logging, and health endpoints into a FastAPI app
(requires the `[api]` extra):

```python
# Import and use the package
from foundry_unify import YourModule
from fastapi import FastAPI

from foundry_unify.api.health import router as health_router
from foundry_unify.core.config import settings
from foundry_unify.middleware import (
CorrelationMiddleware,
add_security_middleware,
)
from foundry_unify.utils.logging import get_logger, setup_logging

# Configure structured logging (JSON in production, rich console in dev).
setup_logging(
level=settings.log_level,
json_logs=settings.json_logs,
include_timestamp=settings.include_timestamp,
include_correlation=True,
)
logger = get_logger(__name__)

app = FastAPI(title="foundry-unify")

# Correlation must be added first so subsequent middleware can log with IDs.
app.add_middleware(CorrelationMiddleware)

# Security headers, CORS, rate limiting, SSRF prevention.
add_security_middleware(
app,
enable_https_redirect=False, # Set True behind TLS-terminating proxies.
enable_rate_limiting=True,
enable_ssrf_prevention=True,
allowed_origins=["https://example.com"],
allowed_hosts=["api.example.com"],
rate_limit_rpm=100,
)

# Kubernetes probes at /health/live, /health/ready, /health/startup.
app.include_router(health_router)


@app.get("/")
async def root() -> dict[str, str]:
logger.info("root_called")
return {"status": "ok"}
```

Raise typed exceptions from the centralised hierarchy:

# Example: Create an instance and use it
module = YourModule()
result = module.process()
print(result)
```python
from foundry_unify.core.exceptions import ValidationError

raise ValidationError(
"Invalid email format",
field="email",
value="not-an-email",
)
# ValidationError.to_dict() -> JSON-serialisable error payload.
```

### Configuration

Settings load from environment variables with the `FOUNDRY_UNIFY_` prefix
(see `src/foundry_unify/core/config.py`). All variables are optional.

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `FOUNDRY_UNIFY_LOG_LEVEL` | `DEBUG`/`INFO`/`WARNING`/`ERROR`/`CRITICAL` | `INFO` | Application log level. |
| `FOUNDRY_UNIFY_JSON_LOGS` | bool | `false` | Emit JSON logs (production) instead of rich console output. |
| `FOUNDRY_UNIFY_INCLUDE_TIMESTAMP` | bool | `true` | Include ISO-8601 timestamps in log records. |

`Settings` is a `pydantic_settings.BaseSettings` subclass and also reads from
a `.env` file when one is present. Extra unrecognised variables are ignored
(`extra="ignore"`).

## Supply Chain Security

This project implements enterprise-grade supply chain security with a multi-tier package index strategy and centralized secrets management.
Expand Down
Loading
Loading