Skip to content

Latest commit

 

History

History
263 lines (194 loc) · 11.6 KB

File metadata and controls

263 lines (194 loc) · 11.6 KB

ClickGraph

ClickGraph - A high-performance, stateless, read-only graph query service for ClickHouse, written in Rust, with Neo4j ecosystem compatibility - Cypher and Bolt Protocol 5.8 support. Now supports embedded mode and exporting query results to external destinations, with Golang, Python bindings, in addition to native Rust.

Note: ClickGraph dev release is at beta quality for view-based graph analytics applications. Kindly raise an issue if you encounter any problem.


Motivation and Rationale

  • Viewing ClickHouse databases (including external sources) as graph data with graph analytics capability brings another level of abstraction and boosts productivity with graph tools, and enables agentic GraphRAG support.
  • Research shows relational analytics with columnar stores and vectorized execution engines like ClickHouse provide superior analytical performance and scalability to graph-native technologies, which usually leverage explicit adjacency representations and are more suitable for local-area graph traversals.
  • View-based graph analytics offer the benefits of zero-ETL without the hassle of data migration and duplicate cost, yet better performance and scalability than most of the native graph analytics options.
  • Neo4j Bolt protocol support gives access to the tools available based on the Bolt protocol.

What's New Under Development

  • Embedded write API for GraphRAG - create_node(), create_edge(), upsert_node() with batch variants. AI agents can extract entities from documents, store them as graph data, and query with Cypher — all in-process. See Embedded Mode Write API.

What's New in v0.6.4-dev

  • Embedded mode - Query Parquet/Iceberg/Delta/S3 directly — no ClickHouse server needed. Use as a Rust library (clickgraph-embedded) or run the server with --embedded. Use for agent's local tool.
  • Golang and Python bindings - for embedded ClickGraph in addition to Rust native interface.
  • Export query results - CALL apoc.export.{csv|json|parquet}.query() exports to files, S3, GCS, Azure, and HTTP destinations. Compatible commands with Kuzu and DuckDB are also provided.
  • Denormalized & coupled schema fixes - Corrected property mapping, OPTIONAL MATCH (was silently dropped), and VLP cycle prevention for schemas where node properties are embedded in edge tables.
  • 1,591 unit tests - Up from 1,277, with comprehensive cross-schema pattern matrix tests.

See CHANGELOG.md for complete release history.


Features

Core Capabilities

  • Cypher-to-SQL Translation - Industry-standard Cypher read syntax translated to optimized ClickHouse SQL
  • Stateless Architecture - Offloads all query execution to ClickHouse; no extra datastore required
  • Embedded Mode - In-process graph queries over Parquet/Iceberg/Delta/S3 via chdb; no ClickHouse server needed (--features embedded)
  • LLM-powered schema discovery - :discover command generates YAML schema from ClickHouse table metadata using Anthropic or OpenAI.
  • Variable-Length Paths - Recursive traversals with *1..3 syntax using ClickHouse WITH RECURSIVE CTEs
  • Path Functions - length(p), nodes(p), relationships(p) for path analysis
  • Parameterized Queries - Neo4j-compatible $param syntax for SQL injection prevention
  • Query Cache - LRU caching with 10-100x speedup for repeated translations
  • ClickHouse Functions - Pass-through via ch.function_name() and chagg.aggregate() prefixes
  • GraphRAG structured output - format: "Graph" returns deduplicated nodes, edges, and stats for graph visualization and RAG pipelines.
  • Query Metrics - Phase-by-phase timing via HTTP headers and structured logging
  • ClickHouse cluster load balancing - CLICKHOUSE_CLUSTER env var auto-discovers and balances queries across cluster nodes.
  • Complex queries like LDBC SNB benchmark: 36/37 queries (97%) - Near-complete Social Network Benchmark coverage. See benchmark results for performance data on sf0.003 and sf10 datasets.

Neo4j Ecosystem Compatibility

  • Bolt Protocol v5.8 - Full Neo4j driver compatibility (cypher-shell, Neo4j Browser, graph-notebook)
  • HTTP REST API - Complete query execution with parameters and aggregations
  • Multi-Schema Support - Per-request schema selection via USE clause, session parameter, or default
  • Authentication - Multiple auth schemes including basic auth

View-Based Graph Model

  • Zero Migration - Map existing tables to graph format through YAML configuration
  • Auto-Discovery - auto_discover_columns: true queries ClickHouse metadata automatically
  • Dynamic Schema Loading - Runtime schema registration via POST /schemas/load
  • Composite Node IDs - Multi-column identity (e.g., node_id: [tenant_id, user_id])

Architecture

ClickGraph runs as a lightweight stateless query translator alongside ClickHouse:

flowchart LR
    Clients["Graph Clients<br/><br/>HTTP/REST<br/>Bolt Protocol<br/>(Neo4j tools)"]

    ClickGraph["ClickGraph<br/><br/>Cypher -> SQL<br/>Translator<br/><br/>:8080 (HTTP)<br/>:7687 (Bolt)"]

    ClickHouse["ClickHouse<br/><br/>Columnar Storage<br/>Query Engine"]

    Clients -->|Cypher| ClickGraph
    ClickGraph -->|SQL| ClickHouse
    ClickHouse -->|Results| ClickGraph
    ClickGraph -->|Results| Clients

    style ClickGraph fill:#e1f5ff,stroke:#0288d1,stroke-width:3px
    style ClickHouse fill:#fff3e0,stroke:#f57c00,stroke-width:3px
    style Clients fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
Loading

Three-tier architecture: Graph clients -> ClickGraph translator -> ClickHouse database


Quick Start

New to ClickGraph? See the Getting Started Guide for a complete walkthrough.

Option 1: Docker (Recommended)

# Pull the latest image
docker pull genezhang/clickgraph:latest

# Start ClickHouse only
docker-compose up -d clickhouse-service

# Run ClickGraph from Docker Hub image
docker run -d \
  --name clickgraph \
  --network clickgraph_default \
  -p 8080:8080 \
  -p 7687:7687 \
  -e CLICKHOUSE_URL="http://clickhouse-service:8123" \
  -e CLICKHOUSE_USER="test_user" \
  -e CLICKHOUSE_PASSWORD="test_pass" \
  -e GRAPH_CONFIG_PATH="/app/schemas/social_benchmark.yaml" \
  -v $(pwd)/benchmarks/social_network/schemas:/app/schemas:ro \
  genezhang/clickgraph:latest

Or use docker-compose (uses published image by default):

docker-compose up -d

Option 2: Build from Source

# Prerequisites: Rust toolchain (1.85+) and Docker for ClickHouse

# 1. Clone and start ClickHouse
git clone https://github.com/genezhang/clickgraph
cd clickgraph
docker-compose up -d clickhouse-service

# 2. Build and run
cargo build --release
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"
cargo run --bin clickgraph

GRAPH_CONFIG_PATH is required. It tells ClickGraph how to map ClickHouse tables to graph nodes and edges.

Test Your Setup

# HTTP API
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "MATCH (u:User) RETURN u.full_name LIMIT 5"}'

# Bolt protocol (cypher-shell, Neo4j Browser, or any Neo4j driver)
cypher-shell -a bolt://localhost:7687 -u neo4j -p password

Visualize with Neo4j Browser

Run the included demo for interactive graph visualization:

cd demos/neo4j-browser && bash setup.sh

Then open http://localhost:7474 and connect to bolt://localhost:7687. See demos/neo4j-browser/README.md for details.

AI Assistant Integration (MCP)

ClickGraph implements apoc.meta.schema() and Neo4j-compatible schema procedures, enabling AI assistants (Claude, etc.) to discover your graph structure via MCP servers like @anthropic-ai/mcp-server-neo4j and @neo4j/mcp-neo4j.

See the MCP Setup Guide for configuration details.

CLI Client

cargo build --release -p clickgraph-client
./target/release/clickgraph-client  # connects to http://localhost:8080

Schema Configuration

Map your tables to a graph with YAML:

views:
  - name: social_network
    nodes:
      - label: user
        table: users
        database: mydb
        node_id: user_id
        property_mappings:
          name: full_name
    edges:
      - type: follows
        table: user_follows
        database: mydb
        from_node: user
        to_node: user
        from_id: follower_id
        to_id: followed_id
MATCH (u:user)-[:follows]->(friend:user)
WHERE u.name = 'Alice'
RETURN friend.name

Documentation


Development Status

Current Version: v0.6.4-dev

Test Coverage

  • Rust Unit Tests: 1,588 passing (100%)
  • Integration Tests: 3,068 passing (108 environment-dependent)
  • LDBC SNB: 36/37 queries passing (97%)
  • Benchmarks: 14/14 passing (100%)
  • E2E Tests: Bolt 4/4, Cache 5/5 (100%)

Known Limitations

  • Read-Only Engine: Write operations not supported by design
  • Anonymous Nodes: Use named nodes for better SQL generation

See STATUS.md and KNOWN_ISSUES.md for details.

Roadmap

Phase Version Status
Phase 1 v0.4.0 Complete - Query cache, parameters, Bolt protocol
Phase 2 v0.5.0 Complete - Multi-tenancy, RBAC, auto-schema discovery
Phase 2.5-2.6 v0.5.2-v0.5.3 Complete - Schema variations, Cypher functions
Phase 3 v0.6.3 Complete - WITH redesign, GraphRAG, LDBC SNB, MCP
Phase 4 v0.6.x Next - user-requested features, advanced optimizations

See ROADMAP.md for detailed feature tracking.

Contributing

Contributions welcome! See DEV_QUICK_START.md to get started and DEVELOPMENT_PROCESS.md for the full workflow.

License

ClickGraph is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

This project is developed on a forked repo of Brahmand with zero-ETL view-based graph querying, Neo4j ecosystem compatibility and enterprise deployment capabilities.