Skip to content

hyugma/datagen-jaffle-shop

Repository files navigation

Jaffle Shop Data Generator

This repository contains a Fivetran Connector designed to generate synthetic data for the "Jaffle Shop" dataset (Customers, Orders, Items, Products, Payments). It mimics an e-commerce transactional database and is built using the Fivetran Connector SDK.

Features

This connector simulates a real-world SaaS application database by generating diverse datasets:

  • Static Data (Dimension Tables):

    • customers: A fixed set of 10 customers with Japanese and English names.
    • products: A catalog of 20 jaffles, beverages, and desserts with pricing.
    • Note: These tables are re-imported in every sync to capture any potential updates (simulating master data changes).
  • Incremental Data (Fact Tables):

    • orders: Transactional records of customer purchases.
    • items: Line items associated with each order (1-3 items per order).
    • payments: Payment records linked to orders.
    • Note: These tables are generated incrementally based on the previous state (last_order_id, etc.).

Configuration

The connector behavior is controlled via configuration.json. The primary configuration parameter is:

  • limit: (Integer) The number of new orders to generate in a single sync run. Default is 20.

Example configuration.json:

{
    "limit": 20
}

Data Generator Logic

The core logic resides in data_generator.py:

  • Deterministic Generation: The generator can be seeded for reproducibility (seed=42 is used in the connector).
  • Customers & Products: Hardcoded lists to ensure consistent master data.
  • Orders:
    • New orders start from last_order_id + 1.
    • Timestamps are incremented sequentially from the last order time, adding a random 0-10 second delay between orders to simulate natural traffic.
    • Status is randomly assigned (served or cancelled).
  • Items:
    • Each order contains 1-3 random items from the product catalog.
    • Quantity is randomized (1-2).
  • Payments:
    • Calculated based on the sum of item prices.
    • Payment method is random (credit_card, cash, gift_card).

Development Guide

Prerequisites

  • Python 3.9+
  • Fivetran Connector SDK (fivetran-connector-sdk)

Installation

Ensure you have the SDK installed:

pip install fivetran-connector-sdk

Debugging locally

You can run the connector locally to test the sync process and view the output (including state updates) in your terminal.

fivetran debug

This command simulates a sync using your configuration.json and local state.

Deploying

To deploy this connector to Fivetran:

  1. Initialize Deployment (if not already done):
    fivetran init
  2. Deploy:
    fivetran deploy --api-key <YOUR_API_KEY> --destination <DESTINATION_NAME> --connection <CONNECTION_NAME>

For detailed deployment instructions, refer to the Fivetran Connector SDK Documentation.

Project Structure

  • connector.py: The entry point for the Fivetran connector. Defines the update function and schema.
  • data_generator.py: Contains the DataGenerator class for creating synthetic data.
  • schema_utils.py: Helper for defining the Fivetran schema.
  • configuration.json: Configuration file for the connector.
  • spec.json: Specification file defining the configuration schema.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages