Skip to content

mynkpdr/verifiable-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Verifiable Intelligence

An executive-grade verification workbench for natural-language BigQuery analytics over a simple retail transaction dataset.

The user asks one retail spend question. Three independent Codex agents propose approaches. The backend validates every SQL statement, dry-runs and executes approved BigQuery queries, runs safe Python transforms when needed, compares outputs, flags disagreements, and returns a final answer with evidence.

Why This Version Exists

This app demonstrates a workflow that is relevant to retail analytics without using private financial data:

  • One guided screen instead of a scattered dashboard.
  • Live Codex agents by default.
  • Guided check mode as a reliable fallback.
  • BigQuery remains the only source of data.
  • The browser never sees OpenAI keys or Google credentials.
  • Codex agents never receive credentials and never execute BigQuery.

Dataset

The demo uses approved tables from:

bigquery-public-data.thelook_ecommerce

Allowed tables:

  • order_items: each row is one purchased item with sale price, order status, order date, product ID, order ID, and anonymized user ID.
  • products: product category, brand, department, product name, cost, and retail price.

This is a retail spend analytics proxy: completed sales, revenue, product categories, brands, departments, and order status.

Local Setup

cd /path/to/verifiable-intelligence
cp .env.example .env
uv sync
uv run uvicorn app.main:app --reload --port 8090

Open http://localhost:8090.

The .env file should contain server-side secrets only. Do not place keys in static/index.html.

BigQuery Credentials

The app reads BigQuery credentials from .secrets/service-account.json. This file must be created locally and must not be committed.

  1. In Google Cloud Console, select the billing project you will use for BigQuery jobs.
  2. Enable the BigQuery API if it is not already enabled.
  3. Create a service account for this app.
  4. Grant it permission to run BigQuery jobs in that project, for example BigQuery Job User.
  5. Create and download a JSON key for the service account.
  6. Save that downloaded JSON key as .secrets/service-account.json:
mkdir -p .secrets
cp .secrets/service-account.sample.json .secrets/service-account.json

Replace the sample contents with the downloaded JSON key. Keep .env pointed at that local file:

GOOGLE_APPLICATION_CREDENTIALS=.secrets/service-account.json
BIGQUERY_PROJECT_ID=your-gcp-project-id

The real .secrets/service-account.json file is ignored by git. Commit only the sample file.

Default Demo Question

Use:

Which product categories generated the most completed-sales revenue?

Live mode starts three Codex subprocesses:

  • SQL Analyst: direct GoogleSQL answer.
  • Python Cross-Check: grouped SQL plus analyze(rows).
  • Stress-Test Analyst: independent alternate approach.

Guided check mode uses deterministic fixtures that show one common analytics mistake: ranking categories by item count instead of revenue.

API

  • GET /api/health
  • GET /api/schema
  • POST /api/verify
  • POST /api/verify/stream
  • GET /api/runs/{runId}

Security Model

  • BigQuery access is restricted by IAM and by config/allowlist.json.
  • SQL is parsed with sqlglot using the BigQuery dialect.
  • Only one read-only SELECT with an explicit LIMIT is accepted.
  • DDL, DML, scripts, CALL, EXECUTE IMMEDIATE, temp functions, unauthorized tables, and INFORMATION_SCHEMA are blocked.
  • Every query is dry-run before execution and runs with maximum_bytes_billed.
  • Python candidates run only after AST validation in an isolated subprocess.

Tests

uv run ruff check app tests
uv run pytest -q

The tests cover command construction, unsafe SQL blocking, Python sandboxing, result comparison, visualization fallback, and API streaming behavior.

About

An executive-grade verification workbench for natural-language BigQuery analytics over a simple retail transaction dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors