This comprehensive example demonstrates ClickGraph's power for real-world analytics scenarios! 🚀📊
When starting ClickGraph, you may see:
Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query
Status:
Impact: None - all Cypher queries work correctly despite these warnings.
If using MergeTree engine, you might encounter:
filesystem error: in rename: Permission denied
Solutions:
- Quick fix: Use Memory engine for demo (data not persistent)
- Production fix: Fix Docker permissions:
sudo chown -R 101:101 ./clickhouse_data - Clean slate: Remove and recreate volume:
docker volume rm clickgraph_clickhouse_data
Ensure you're using the correct ClickHouse credentials from docker-compose.yaml:
- Username:
test_user - Password:
test_pass
- Memory engine: Fast but data lost on restart (development only)
- MergeTree engine: Persistent storage for production use
- Schema warnings: Will be resolved in future ClickGraph versions
We'll analyze an e-commerce platform with:
- Customers who purchase products
- Products in different categories
- Orders connecting customers to products
- Reviews and ratings
- Geographic and temporal patterns
-- Connect to ClickHouse and create the schema
CREATE DATABASE IF NOT EXISTS ecommerce;
USE ecommerce;
-- Customers table
CREATE TABLE customers (
customer_id UInt32,
email String,
first_name String,
last_name String,
age UInt8,
gender Enum8('M' = 1, 'F' = 2, 'O' = 3),
country String,
city String,
registration_date Date,
total_spent Decimal(10,2),
is_premium UInt8 DEFAULT 0
) ENGINE = MergeTree()
ORDER BY customer_id;
-- Products table
CREATE TABLE products (
product_id UInt32,
name String,
category String,
brand String,
price Decimal(8,2),
rating Float32,
num_reviews UInt32,
in_stock UInt8 DEFAULT 1,
created_date Date
) ENGINE = MergeTree()
ORDER BY product_id;
-- Orders table
CREATE TABLE orders (
order_id UInt32,
customer_id UInt32,
product_id UInt32,
quantity UInt16,
unit_price Decimal(8,2),
total_amount Decimal(10,2),
order_date Date,
order_time DateTime,
status Enum8('pending' = 1, 'shipped' = 2, 'delivered' = 3, 'cancelled' = 4)
) ENGINE = MergeTree()
ORDER BY (order_date, order_id);
-- Reviews table
CREATE TABLE reviews (
review_id UInt32,
customer_id UInt32,
product_id UInt32,
order_id UInt32,
rating UInt8, -- 1-5 stars
review_text String,
review_date Date,
helpful_votes UInt32 DEFAULT 0
) ENGINE = MergeTree()
ORDER BY review_date;
-- Category relationships (products can belong to subcategories)
CREATE TABLE category_hierarchy (
parent_category String,
child_category String,
level UInt8
) ENGINE = MergeTree()
ORDER BY (parent_category, child_category);-- Insert customers (representing different personas)
INSERT INTO customers VALUES
(1, 'alice.johnson@email.com', 'Alice', 'Johnson', 28, 'F', 'USA', 'New York', '2023-01-15', 1250.00, 1),
(2, 'bob.smith@email.com', 'Bob', 'Smith', 34, 'M', 'Canada', 'Toronto', '2023-02-20', 890.50, 0),
(3, 'carol.brown@email.com', 'Carol', 'Brown', 42, 'F', 'UK', 'London', '2023-01-10', 2100.75, 1),
(4, 'david.wilson@email.com', 'David', 'Wilson', 25, 'M', 'Australia', 'Sydney', '2023-03-05', 450.25, 0),
(5, 'emma.davis@email.com', 'Emma', 'Davis', 31, 'F', 'Germany', 'Berlin', '2023-02-28', 1680.00, 1),
(6, 'frank.miller@email.com', 'Frank', 'Miller', 29, 'M', 'USA', 'San Francisco', '2023-01-25', 750.30, 0),
(7, 'grace.lee@email.com', 'Grace', 'Lee', 26, 'F', 'Japan', 'Tokyo', '2023-03-10', 920.80, 0),
(8, 'henry.taylor@email.com', 'Henry', 'Taylor', 38, 'M', 'France', 'Paris', '2023-02-15', 1450.60, 1);
-- Insert products across different categories
INSERT INTO products VALUES
(101, 'iPhone 15 Pro', 'Electronics', 'Apple', 999.99, 4.5, 1250, 1, '2023-09-15'),
(102, 'Samsung Galaxy S24', 'Electronics', 'Samsung', 849.99, 4.3, 890, 1, '2024-01-20'),
(103, 'Sony WH-1000XM5', 'Electronics', 'Sony', 399.99, 4.7, 2100, 1, '2023-05-10'),
(104, 'MacBook Pro M3', 'Electronics', 'Apple', 1999.99, 4.6, 650, 1, '2023-11-01'),
(105, 'Nike Air Max', 'Fashion', 'Nike', 129.99, 4.2, 450, 1, '2023-06-15'),
(106, 'Levi\'s 501 Jeans', 'Fashion', 'Levi\'s', 69.99, 4.1, 320, 1, '2023-04-20'),
(107, 'The Great Gatsby', 'Books', 'Scribner', 12.99, 4.4, 1890, 1, '2023-01-01'),
(108, 'Instant Pot Duo', 'Home & Kitchen', 'Instant Pot', 79.99, 4.5, 3200, 1, '2023-03-15'),
(109, 'Dyson V15 Vacuum', 'Home & Kitchen', 'Dyson', 549.99, 4.4, 780, 1, '2023-07-20'),
(110, 'Fitbit Charge 6', 'Electronics', 'Fitbit', 159.99, 4.0, 1100, 1, '2023-10-05');
-- Insert orders (purchase history)
INSERT INTO orders VALUES
(1001, 1, 101, 1, 999.99, 999.99, '2024-01-20', '2024-01-20 14:30:00', 'delivered'),
(1002, 1, 103, 1, 399.99, 399.99, '2024-02-15', '2024-02-15 10:15:00', 'delivered'),
(1003, 2, 105, 2, 129.99, 259.98, '2024-01-25', '2024-01-25 16:45:00', 'delivered'),
(1004, 3, 104, 1, 1999.99, 1999.99, '2024-02-10', '2024-02-10 09:20:00', 'delivered'),
(1005, 3, 108, 1, 79.99, 79.99, '2024-02-12', '2024-02-12 11:30:00', 'delivered'),
(1006, 4, 107, 3, 12.99, 38.97, '2024-01-30', '2024-01-30 13:15:00', 'delivered'),
(1007, 5, 102, 1, 849.99, 849.99, '2024-03-05', '2024-03-05 15:45:00', 'shipped'),
(1008, 5, 106, 2, 69.99, 139.98, '2024-03-06', '2024-03-06 12:00:00', 'delivered'),
(1009, 6, 110, 1, 159.99, 159.99, '2024-02-20', '2024-02-20 17:30:00', 'delivered'),
(1010, 7, 109, 1, 549.99, 549.99, '2024-03-15', '2024-03-15 14:00:00', 'pending'),
(1011, 8, 101, 1, 999.99, 999.99, '2024-03-01', '2024-03-01 10:45:00', 'delivered'),
(1012, 1, 108, 1, 79.99, 79.99, '2024-03-20', '2024-03-20 16:20:00', 'delivered');
-- Insert reviews
INSERT INTO reviews VALUES
(2001, 1, 101, 1001, 5, 'Amazing phone! Camera quality is outstanding.', '2024-01-25', 15),
(2002, 1, 103, 1002, 5, 'Best noise cancellation I\'ve ever experienced.', '2024-02-20', 23),
(2003, 2, 105, 1003, 4, 'Comfortable and stylish. Great for running.', '2024-02-01', 8),
(2004, 3, 104, 1004, 5, 'MacBook Pro is incredibly fast. Worth every penny.', '2024-02-15', 31),
(2005, 3, 108, 1005, 4, 'Makes cooking so much easier. Highly recommended.', '2024-02-17', 12),
(2006, 4, 107, 1006, 5, 'Classic literature at its finest. Timeless story.', '2024-02-05', 6),
(2007, 5, 102, 1007, 4, 'Great Android phone. Battery life could be better.', '2024-03-10', 9),
(2008, 6, 110, 1009, 3, 'Good fitness tracker but app needs improvement.', '2024-02-25', 4),
(2009, 8, 101, 1011, 5, 'Upgraded from iPhone 12. Significant improvement!', '2024-03-05', 18);
-- Insert category hierarchy
INSERT INTO category_hierarchy VALUES
('Technology', 'Electronics', 1),
('Technology', 'Computers', 1),
('Lifestyle', 'Fashion', 1),
('Lifestyle', 'Home & Kitchen', 1),
('Education', 'Books', 1),
('Electronics', 'Smartphones', 2),
('Electronics', 'Audio', 2),
('Fashion', 'Footwear', 2),
('Fashion', 'Clothing', 2);Create ecommerce_graph.yaml:
name: ecommerce_analytics
version: "1.0"
description: "E-commerce platform graph analysis"
graph_schema:
nodes:
- label: Customer
database: ecommerce
table: customers
node_id: customer_id
property_mappings:
email: email
name: "concat(first_name, ' ', last_name)"
first_name: first_name
last_name: last_name
age: age
gender: gender
country: country
city: city
registration_date: registration_date
total_spent: total_spent
is_premium: is_premium
filters:
- "customer_id > 0"
- label: Product
database: ecommerce
table: products
node_id: product_id
property_mappings:
name: name
category: category
brand: brand
price: price
rating: rating
num_reviews: num_reviews
in_stock: in_stock
created_date: created_date
filters:
- "in_stock = 1"
- label: Order
database: ecommerce
table: orders
node_id: order_id
property_mappings:
quantity: quantity
unit_price: unit_price
total_amount: total_amount
order_date: order_date
order_time: order_time
status: status
filters:
- "status != 'cancelled'"
- label: Review
database: ecommerce
table: reviews
node_id: review_id
property_mappings:
rating: rating
review_text: review_text
review_date: review_date
helpful_votes: helpful_votes
- label: Category
database: ecommerce
table: category_hierarchy
node_id: child_category
property_mappings:
name: child_category
parent: parent_category
level: level
edges:
- type: PURCHASED
database: ecommerce
table: orders
from_node: Customer
to_node: Product
from_id: customer_id
to_id: product_id
property_mappings:
quantity: quantity
amount: total_amount
date: order_date
status: status
filters:
- "status IN ('shipped', 'delivered')"
- type: PLACED_ORDER
database: ecommerce
table: orders
from_node: Customer
to_node: Order
from_id: customer_id
to_id: order_id
property_mappings:
date: order_date
- type: ORDER_CONTAINS
database: ecommerce
table: orders
from_node: Order
to_node: Product
from_id: order_id
to_id: product_id
property_mappings:
quantity: quantity
unit_price: unit_price
- type: REVIEWED
database: ecommerce
table: reviews
from_node: Customer
to_node: Product
from_id: customer_id
to_id: product_id
property_mappings:
rating: rating
review_text: review_text
date: review_date
helpful_votes: helpful_votes
- type: BELONGS_TO
database: ecommerce
table: products
from_node: Product
to_node: Category
from_id: product_id
to_id: category
property_mappings: {}# Set environment variables
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="default"
export CLICKHOUSE_PASSWORD=""
export CLICKHOUSE_DATABASE="ecommerce"
export CLICKGRAPH_CONFIG_PATH="ecommerce_graph.yaml"
# Start ClickGraph
cargo run --bin clickgraphMATCH (c:Customer)
WHERE c.total_spent > 1000
RETURN c.name, c.total_spent, c.is_premium, c.country
ORDER BY c.total_spent DESCMATCH (c:Customer)
WHERE c.is_premium = 1
RETURN c.country, count(c) as premium_customers, avg(c.total_spent) as avg_spent
ORDER BY premium_customers DESCMATCH (target:Customer {name: 'Alice Johnson'})-[:PURCHASED]->(p:Product)
MATCH (similar:Customer)-[:PURCHASED]->(p)
WHERE similar <> target
WITH similar, count(p) as shared_products
WHERE shared_products >= 2
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT EXISTS((target)-[:PURCHASED]->(rec))
RETURN rec.name, rec.category, rec.price, rec.rating,
count(similar) as recommended_by
ORDER BY recommended_by DESC, rec.rating DESC
LIMIT 5MATCH (p1:Product)<-[:ORDER_CONTAINS]-(o:Order)-[:ORDER_CONTAINS]->(p2:Product)
WHERE p1.product_id < p2.product_id
WITH p1, p2, count(o) as co_purchases
WHERE co_purchases >= 2
RETURN p1.name, p2.name, p1.category, p2.category, co_purchases
ORDER BY co_purchases DESC
LIMIT 10MATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH c, prod, p
ORDER BY p.date
WITH c, collect({product: prod.name, category: prod.category,
date: p.date, amount: p.amount}) as journey
RETURN c.name, c.country, journey[0] as first_purchase,
journey[-1] as latest_purchase, size(journey) as total_purchases
ORDER BY total_purchases DESCMATCH (c:Customer)-[:PURCHASED]->(p:Product)
WITH c, collect(DISTINCT p.category) as categories
WHERE size(categories) > 1
RETURN c.name, c.age, c.gender, categories, size(categories) as category_diversity
ORDER BY category_diversity DESCMATCH (p:Product)<-[r:REVIEWED]-(c:Customer)
WITH p, avg(r.rating) as avg_rating, count(r) as review_count,
collect(r.rating) as all_ratings
RETURN p.name, p.category, p.brand,
round(avg_rating, 2) as average_rating,
review_count,
round(p.rating, 2) as listed_rating,
abs(round(avg_rating - p.rating, 2)) as rating_difference
ORDER BY rating_difference DESCMATCH (c:Customer)-[r:REVIEWED]->(p:Product)
WITH c, count(r) as review_count, avg(r.rating) as avg_rating,
sum(r.helpful_votes) as total_helpful_votes
WHERE review_count >= 2
RETURN c.name, c.age, c.country, review_count,
round(avg_rating, 2) as avg_rating,
total_helpful_votes,
round(total_helpful_votes * 1.0 / review_count, 1) as avg_helpfulness
ORDER BY total_helpful_votes DESCMATCH (c:Customer)-[:PURCHASED]->(p1:Product),
(c)-[:PURCHASED]->(p2:Product)
WHERE p1.brand <> p2.brand
WITH c, p1.brand as brand1, p2.brand as brand2, count(*) as purchases
WHERE purchases >= 1
WITH brand1, brand2, count(c) as customers, sum(purchases) as total_purchases
WHERE customers >= 2
RETURN brand1, brand2, customers, total_purchases,
round(total_purchases * 1.0 / customers, 1) as avg_purchases_per_customer
ORDER BY customers DESC, total_purchases DESCMATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH prod.category as category,
toInteger(formatDateTime(p.date, '%m')) as month,
count(p) as purchases, sum(p.amount) as revenue
RETURN category, month, purchases, round(revenue, 2) as monthly_revenue
ORDER BY category, monthMATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH c,
count(p) as total_orders,
sum(p.amount) as total_spent,
avg(p.amount) as avg_order_value,
min(p.date) as first_purchase,
max(p.date) as last_purchase
WITH c, total_orders, total_spent, avg_order_value,
duration.between(first_purchase, last_purchase).days as days_active
WHERE days_active > 0
WITH c, total_orders, total_spent, avg_order_value, days_active,
round(total_orders * 1.0 / (days_active / 30.0), 2) as monthly_frequency
RETURN c.name, c.age, c.country, c.is_premium,
total_orders, round(total_spent, 2) as total_spent,
round(avg_order_value, 2) as avg_order_value,
days_active, monthly_frequency,
round(avg_order_value * monthly_frequency * 12, 2) as predicted_annual_value
ORDER BY predicted_annual_value DESCMATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH
CASE
WHEN c.age < 25 THEN 'Gen Z'
WHEN c.age < 40 THEN 'Millennial'
WHEN c.age < 55 THEN 'Gen X'
ELSE 'Boomer'
END as generation,
prod.category as category,
count(p) as purchases,
sum(p.amount) as revenue,
count(DISTINCT c.customer_id) as unique_customers
RETURN generation, category, purchases,
round(revenue, 2) as total_revenue,
unique_customers,
round(revenue / unique_customers, 2) as revenue_per_customer,
round(purchases * 1.0 / unique_customers, 1) as purchases_per_customer
ORDER BY generation, total_revenue DESC# Customer recommendation analysis
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (target:Customer {name: $customerName})-[:PURCHASED]->(p:Product) MATCH (similar:Customer)-[:PURCHASED]->(p) WHERE similar <> target WITH similar, count(p) as shared_products WHERE shared_products >= 2 MATCH (similar)-[:PURCHASED]->(rec:Product) WHERE NOT EXISTS((target)-[:PURCHASED]->(rec)) RETURN rec.name, rec.category, rec.price, count(similar) as recommended_by ORDER BY recommended_by DESC LIMIT 5",
"parameters": {"customerName": "Alice Johnson"}
}'
# Market basket analysis
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (p1:Product)<-[:ORDER_CONTAINS]-(o:Order)-[:ORDER_CONTAINS]->(p2:Product) WHERE p1.product_id < p2.product_id WITH p1, p2, count(o) as co_purchases WHERE co_purchases >= 2 RETURN p1.name, p2.name, co_purchases ORDER BY co_purchases DESC LIMIT 10"
}'from neo4j import GraphDatabase
import json
# Connect to ClickGraph
driver = GraphDatabase.driver("bolt://localhost:7687")
def analyze_customer_segments():
"""Analyze customer purchasing patterns by demographics"""
query = """
MATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH
CASE
WHEN c.age < 25 THEN 'Gen Z'
WHEN c.age < 40 THEN 'Millennial'
WHEN c.age < 55 THEN 'Gen X'
ELSE 'Boomer'
END as generation,
c.country as country,
prod.category as category,
sum(p.amount) as revenue,
count(DISTINCT c.customer_id) as customers
RETURN generation, country, category, revenue, customers
ORDER BY generation, revenue DESC
"""
with driver.session() as session:
result = session.run(query)
segments = []
for record in result:
segments.append({
'generation': record['generation'],
'country': record['country'],
'category': record['category'],
'revenue': float(record['revenue']),
'customers': record['customers']
})
return segments
def get_product_recommendations(customer_name, min_shared=2):
"""Get product recommendations based on collaborative filtering"""
query = """
MATCH (target:Customer {name: $customerName})-[:PURCHASED]->(p:Product)
MATCH (similar:Customer)-[:PURCHASED]->(p)
WHERE similar <> target
WITH similar, count(p) as shared_products
WHERE shared_products >= $minShared
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT EXISTS((target)-[:PURCHASED]->(rec))
RETURN rec.name as product, rec.category, rec.price, rec.rating,
count(similar) as recommended_by
ORDER BY recommended_by DESC, rec.rating DESC
LIMIT 10
"""
with driver.session() as session:
result = session.run(query, customerName=customer_name, minShared=min_shared)
return [record.data() for record in result]
def analyze_purchase_patterns():
"""Analyze temporal purchase patterns"""
query = """
MATCH (c:Customer)-[p:PURCHASED]->(prod:Product)
WITH prod.category as category,
toInteger(formatDateTime(p.date, '%m')) as month,
count(p) as purchases,
sum(p.amount) as revenue
RETURN category, month, purchases, revenue
ORDER BY category, month
"""
with driver.session() as session:
result = session.run(query)
patterns = {}
for record in result:
category = record['category']
if category not in patterns:
patterns[category] = []
patterns[category].append({
'month': record['month'],
'purchases': record['purchases'],
'revenue': float(record['revenue'])
})
return patterns
# Run analysis
if __name__ == "__main__":
# Customer segmentation
print("=== Customer Segment Analysis ===")
segments = analyze_customer_segments()
for segment in segments[:10]: # Top 10 segments
print(f"{segment['generation']} in {segment['country']}: "
f"{segment['category']} - ${segment['revenue']:.2f} "
f"({segment['customers']} customers)")
# Product recommendations
print("\n=== Product Recommendations for Alice Johnson ===")
recommendations = get_product_recommendations("Alice Johnson")
for rec in recommendations:
print(f"{rec['product']} ({rec['category']}) - "
f"${rec['price']} - {rec['rating']}★ - "
f"Recommended by {rec['recommended_by']} similar customers")
# Purchase patterns
print("\n=== Seasonal Purchase Patterns ===")
patterns = analyze_purchase_patterns()
for category, monthly_data in patterns.items():
total_revenue = sum(month['revenue'] for month in monthly_data)
print(f"{category}: ${total_revenue:.2f} total revenue")
peak_month = max(monthly_data, key=lambda x: x['revenue'])
print(f" Peak: Month {peak_month['month']} (${peak_month['revenue']:.2f})")
driver.close()-- Create indexes for better query performance
CREATE INDEX idx_customers_country ON customers (country) TYPE minmax;
CREATE INDEX idx_customers_age ON customers (age) TYPE minmax;
CREATE INDEX idx_orders_date ON orders (order_date) TYPE minmax;
CREATE INDEX idx_products_category ON products (category) TYPE bloom_filter;
CREATE INDEX idx_products_brand ON products (brand) TYPE bloom_filter;- Use LIMIT clauses for exploratory queries
- Filter early in MATCH clauses
- Use indexes on frequently queried properties
- Batch operations for large datasets
- Monitor query performance with EXPLAIN
-- Optimized query with early filtering
MATCH (c:Customer)
WHERE c.country = 'USA' AND c.total_spent > 500
MATCH (c)-[:PURCHASED]->(p:Product)
WHERE p.category = 'Electronics'
RETURN c.name, p.name, p.price
LIMIT 50This comprehensive example demonstrates:
- Customer Segmentation: Age/geography-based purchasing patterns
- Product Recommendations: Collaborative filtering for cross-selling
- Market Basket Analysis: Products frequently bought together
- Customer Journey: Purchase progression and category exploration
- Seasonal Trends: Temporal purchase patterns by category
- Complex Joins: Multi-table relationship traversals
- Aggregations: Statistical analysis across graph structures
- Conditional Logic: Dynamic customer segmentation
- Temporal Analysis: Date-based pattern recognition
- Performance: Sub-second queries on structured e-commerce data
- Simple queries (single relationship): < 50ms
- Complex traversals (3+ hops): < 200ms
- Aggregation queries: < 500ms
- Recommendation algorithms: < 1s
This end-to-end example showcases ClickGraph's ability to transform traditional e-commerce analytics into powerful graph-based insights, combining ClickHouse's performance with Neo4j's ecosystem compatibility for production-grade analytics! 🚀📊