diff --git a/docs/web-ui-plan.md b/docs/web-ui-plan.md new file mode 100644 index 0000000..1519a72 --- /dev/null +++ b/docs/web-ui-plan.md @@ -0,0 +1,401 @@ +# Web UI Plan: Agno Agent + agent-ui + +> Created: 2026-03-21 +> Status: **Approved — ready for implementation** + +--- + +## Goal + +Add a web interface to open-data-agent so non-technical users can query databases +via natural language without installing OpenCode or any IDE. The chat-based UI +should expose all capabilities currently available through the CLI + OpenCode agent: +NL-to-SQL, schema exploration, query history, memory/knowledge base, self-healing +diagnostics, and configurable LLM providers. + +## Decision + +- **Backend agent framework:** [Agno](https://github.com/agno-agi/agno) (BSD license) +- **Frontend:** [agent-ui](https://github.com/agno-agi/agent-ui) — Agno's open-source, + self-hosted Next.js chat interface (MIT license) +- **No vendor lock-in.** Both components are open source and self-hosted. + +### Why Agno + agent-ui + +| Factor | Benefit | +|---|---| +| Self-hosted end-to-end | Backend on `:7777`, frontend on `:3000`, all on your infra | +| Zero UI development for MVP | Chat interface works out of the box with tool-call visualization | +| Customizable frontend | Fork `agent-ui` to add table rendering, charts, CSV export later | +| Production features built in | Sessions, memory, tracing, auth via AgentOS | +| LLM configurable | Agno supports OpenAI, Anthropic, Google, Ollama, etc. | +| Scalable | Start single-user localhost, deploy multi-user later (Docker/AWS/Railway) | + +### Alternatives Considered + +| Approach | Verdict | +|---|---| +| Agno + hosted Control Plane (`os.agno.com`) | Rejected — vendor dependency on hosted UI | +| Chainlit | Good chat UX but community-maintained (risk), weak data visualization | +| Streamlit | Great for data apps but requires building own agent loop, weak multi-user | +| Gradio | Too demo-oriented, limited customization | +| Open WebUI | Heavy, designed for general LLM chat, hard to integrate custom tools | +| Custom FastAPI + React | Maximum control but 2-6 weeks effort, requires frontend expertise | + +--- + +## Architecture + +``` +┌──────────────────────────┐ ┌──────────────────────────────────┐ +│ agent-ui (Next.js) │ │ AgentOS (FastAPI @ :7777) │ +│ localhost:3000 │────▶│ │ +│ │ │ Data Agent (agno.agent.Agent) │ +│ - Chat interface │ │ - instructions: adapted from │ +│ - Tool call display │ │ data-agent.md.tmpl │ +│ - Streaming responses │ │ - model: configurable │ +│ - (future: data tables, │ │ - tools: │ +│ charts via custom │ │ ├── query_database() │ +│ components) │ │ ├── list_schemas() │ +│ │ │ ├── list_tables() │ +│ Fork & customize for: │ │ ├── describe_table() │ +│ - Table rendering │ │ ├── sample_table() │ +│ - Chart components │ │ ├── profile_table() │ +│ - Export to CSV │ │ ├── search_memory() │ +│ │ │ └── list_memory() │ +└──────────────────────────┘ │ │ + │ Tools call ODA Python APIs │ + │ directly (QueryEngine, │ + │ SchemaInspector, etc.) │ + └──────────────────────────────────┘ +``` + +--- + +## New Files & Structure + +``` +open-data-agent/ +├── src/open_data_agent/ +│ ├── (existing files unchanged) +│ └── web/ # NEW — all web/agent code +│ ├── __init__.py +│ ├── app.py # AgentOS entry point +│ ├── agent.py # Agno Agent definition + configurable model +│ ├── tools.py # ODA tools wrapped for Agno +│ ├── connection_pool.py # Shared DB connection/adapter lifecycle +│ └── instructions.py # Dynamic instructions builder +├── pyproject.toml # Modified: add agno dep + oda-web script +└── agent-ui/ # Git-ignored; cloned from agno-agi/agent-ui +``` + +**No existing files are modified** except `pyproject.toml` (new dependency + new script). + +--- + +## Phase 1: Connection Pool — `web/connection_pool.py` + +**Problem:** ODA's CLI creates a fresh DB connection per command and closes it. +For a long-running web server, we need a managed connection lifecycle. + +**Design:** + +```python +class OdaSession: + """Holds a live DB connection, adapter, and derived services.""" + + adapter: DialectAdapter + conn: Any # live DB connection + config: Config + inspector: SchemaInspector + engine: QueryEngine + diagnostics: DiagnosticEngine + memory: MemoryManager + history: HistoryTracker + doc_generator: DocGenerator + connection_name: str + db_type: str + default_schema: str + + def close(self) -> None: ... + +def create_session(connection_name: str | None = None) -> OdaSession: + """ + Creates a full OdaSession from a named connection (or the active one). + Replicates the _get_inspector() + QueryEngine setup from cli_schema.py / cli_query.py. + """ +``` + +**Setup pattern** (from `cli_schema.py:_get_inspector()`): + +``` +ConnectionManager() + .get_active_connection() -> name + .get_connection(name) -> params (db_type, host, port, database, username, password) + +params["db_type"] -> dispatch to: + SQLiteAdapter() + sqlite3.connect(database, check_same_thread=False) + PostgreSQLAdapter() + psycopg.connect(host, port, dbname, user, password, autocommit=True) + MySQLAdapter() + pymysql.connect(host, port, database, user, password) + +-> adapter + conn -> SchemaInspector(adapter, conn) + -> QueryEngine(adapter, conn, config, history, name, db_type) + -> DiagnosticEngine(adapter, conn) + -> DocGenerator(inspector, db_type) +MemoryManager() # standalone, project-relative ./memory/ +HistoryTracker() # standalone, ~/.config/open-data-agent/history.jsonl +``` + +Session is created once at AgentOS startup and shared across tool calls. + +--- + +## Phase 2: Agno Tools — `web/tools.py` + +8 tools that wrap ODA's Python API directly (no subprocess calls): + +### Tool 1: `query_database(sql: str) -> str` + +- Calls `session.engine.execute(sql, question=sql)` -> `QueryResult` +- On success: formats as markdown table (columns + rows, capped at ~50 rows) +- On zero rows: appends `session.diagnostics.diagnose(sql, result)` +- On error: returns error + diagnostic text +- On `SafetyError`: returns the safety violation message + +### Tool 2: `list_schemas() -> str` + +- Calls `session.inspector.get_schemas()` + +### Tool 3: `list_tables(schema: str | None = None) -> str` + +- Calls `session.inspector.get_tables(schema or session.default_schema)` + +### Tool 4: `describe_table(table: str) -> str` + +- Parses `schema.table` format +- Calls `session.inspector.get_columns(schema, table)` +- Returns markdown table of column definitions + +### Tool 5: `sample_table(table: str, n: int = 5) -> str` + +- Calls `session.inspector.get_sample(schema, table, n)` +- Formats `QueryResult` as markdown table + +### Tool 6: `profile_table(table: str) -> str` + +- Calls `session.inspector.get_profile(schema, table)` +- Formats profile dict as markdown table + +### Tool 7: `search_memory(term: str) -> str` + +- Calls `session.memory.search(term)` +- Returns formatted list of matches (title, category, content excerpt) + +### Tool 8: `list_memory() -> str` + +- Calls `session.memory.list_entries()` + +**Session wiring (MVP):** Module-level singleton initialized at startup. + +```python +_session: OdaSession | None = None + +def init_tools(session: OdaSession): + global _session + _session = session +``` + +`app.py` calls `init_tools(session)` before creating the agent. +Post-MVP: migrate to Agno's dependency injection for proper per-request isolation. + +--- + +## Phase 3: Agent Instructions — `web/instructions.py` + +Adapt `data-agent.md.tmpl` for tool-calling context. Replace CLI command references +(`uv run oda query "..."`) with tool names (`query_database`, `describe_table`, etc.). + +```python +def build_instructions(session: OdaSession) -> str: + """Build agent instructions from connection context + data catalog.""" +``` + +The instructions include: + +1. **Connection Context** — db_type, host, database name +2. **Data Catalog** — embed `docs/data-catalog/_index.md` content directly so the agent + knows all available tables without a tool call +3. **Schema Exploration** — use `list_schemas`, `list_tables`, `describe_table`, + `sample_table`, `profile_table` tools +4. **Query Execution** — use `query_database` tool. Include behavioral notes + (safety, LIMIT injection, timeout) +5. **Memory** — use `search_memory` and `list_memory` tools. Workflow: always + search before writing SQL for columns with known data quality issues +6. **Self-Healing** — same diagnostic checklist from the template, referencing + tools instead of CLI commands. `query_database` returns diagnostic info + automatically on zero rows/errors +7. **Safety Rules** — never write, never expose credentials + +--- + +## Phase 4: Agent Definition — `web/agent.py` + +```python +from agno.agent import Agent +from agno.db.sqlite import SqliteDb + +def create_agent(session: OdaSession, model_id: str = "gpt-4o") -> Agent: + instructions = build_instructions(session) + return Agent( + name="Data Agent", + model=_resolve_model(model_id), + tools=[query_database, list_schemas, list_tables, describe_table, + sample_table, profile_table, search_memory, list_memory], + instructions=instructions, + db=SqliteDb(db_file="oda_sessions.db"), + add_datetime_to_context=True, + add_history_to_context=True, + num_history_runs=5, + markdown=True, + ) + +def _resolve_model(model_id: str): + if model_id.startswith("gpt"): + from agno.models.openai import OpenAIChat + return OpenAIChat(id=model_id) + elif model_id.startswith("claude"): + from agno.models.anthropic import Claude + return Claude(id=model_id) + # extensible for other providers +``` + +--- + +## Phase 5: AgentOS App — `web/app.py` + +```python +import os +from agno.os import AgentOS +from open_data_agent.web.connection_pool import create_session +from open_data_agent.web.tools import init_tools +from open_data_agent.web.agent import create_agent + +def main(): + session = create_session() # uses active ODA connection + init_tools(session) + model_id = os.environ.get("ODA_MODEL", "gpt-4o") + agent = create_agent(session, model_id=model_id) + agent_os = AgentOS(agents=[agent]) + app = agent_os.get_app() + agent_os.serve(app="open_data_agent.web.app:app", host="0.0.0.0", port=7777, reload=True) + +# For uvicorn direct usage: +session = create_session() +init_tools(session) +agent = create_agent(session, model_id=os.environ.get("ODA_MODEL", "gpt-4o")) +agent_os = AgentOS(agents=[agent]) +app = agent_os.get_app() +``` + +--- + +## Phase 6: Dependencies — `pyproject.toml` Changes + +Add to `dependencies`: +```toml +"agno>=2.0", +``` + +Add new script: +```toml +[project.scripts] +oda = "open_data_agent.cli:cli" +oda-web = "open_data_agent.web.app:main" +``` + +--- + +## Phase 7: agent-ui Setup + +Operational (not a code change): + +```bash +# Option A: npx (recommended) +npx create-agent-ui@latest + +# Option B: clone +git clone https://github.com/agno-agi/agent-ui.git agent-ui +cd agent-ui && pnpm install && pnpm dev + +# Opens http://localhost:3000 +# Connects to http://localhost:7777 (AgentOS) +``` + +--- + +## Running the Full Stack + +```bash +# Terminal 1: Start the agent backend +export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY for Claude +export ODA_MODEL=gpt-4o # optional, defaults to gpt-4o +uv run oda-web + +# Terminal 2: Start the frontend +cd agent-ui && pnpm dev + +# Open http://localhost:3000 and start asking data questions +``` + +Prerequisites: an active ODA connection (`uv run oda connect ` must have been +run previously). + +--- + +## Delivery Schedule + +| Phase | Files | What | Effort | +|---|---|---|---| +| 1 | `web/connection_pool.py` | OdaSession: shared connection/adapter lifecycle | 0.5 day | +| 2 | `web/tools.py` | 8 Agno tools wrapping ODA Python APIs | 1.5 days | +| 3 | `web/instructions.py` | Adapted agent instructions (from template) | 0.5 day | +| 4 | `web/agent.py` | Agno Agent definition with configurable model | 0.5 day | +| 5 | `web/app.py` | AgentOS entry point + startup wiring | 0.5 day | +| 6 | `pyproject.toml` | Add `agno` dependency + `oda-web` script | 10 min | +| 7 | (operational) | Clone/setup agent-ui, test end-to-end | 0.5 day | +| **Total MVP** | **6 new files, 1 modified** | | **~4-5 days** | + +--- + +## Future Enhancements (Post-MVP) + +| Enhancement | Effort | +|---|---| +| Fork `agent-ui`, add data table component with sorting/filtering | 1 week | +| Add chart rendering (bar, line, pie) for query results | 1 week | +| CSV/Excel export button in UI | 2 days | +| Docker Compose: `oda-web` + `agent-ui` + optional PostgreSQL | 1 day | +| Multi-connection support: switch between pre-configured DBs via chat | 2 days | +| Add `add_memory` tool (write, not just read) | 0.5 day | +| Agno's built-in memory for per-user preferences | 1 day | +| Migrate tool session wiring from module singleton to Agno dependency injection | 1 day | +| Multi-user auth via AgentOS JWT middleware | 1-2 days | + +--- + +## Key Technical References + +| Component | File | Key Class/Function | +|---|---|---| +| Query execution | `src/open_data_agent/db/query.py` | `QueryEngine.execute()`, `QueryResult` | +| Schema inspection | `src/open_data_agent/db/schema.py` | `SchemaInspector`, `NormalizedColumn` | +| DB connections | `src/open_data_agent/db/connection.py` | `ConnectionManager` | +| Dialect adapters | `src/open_data_agent/db/dialect.py` | `SQLiteAdapter`, `PostgreSQLAdapter`, `MySQLAdapter` | +| Diagnostics | `src/open_data_agent/db/diagnostics.py` | `DiagnosticEngine.diagnose()` | +| Memory | `src/open_data_agent/memory.py` | `MemoryManager` | +| History | `src/open_data_agent/history.py` | `HistoryTracker` | +| Docs status | `src/open_data_agent/docs_generator.py` | `DocGenerator.get_status()` | +| Config | `src/open_data_agent/config.py` | `Config`, `get_config()` | +| Connection setup pattern | `src/open_data_agent/cli_schema.py` | `_get_inspector()` | +| Query setup pattern | `src/open_data_agent/cli_query.py` | `query()` command | +| Instructions template | `src/open_data_agent/templates/data-agent.md.tmpl` | 6-block template |