add Weights and Biases connector example by fivetran-surabhisingh · Pull Request #422 · fivetran/fivetran_connector_sdk

fivetran-surabhisingh · 2025-10-31T09:58:54Z

Jira ticket

Closes <ADD TICKET LINK HERE, EACH PR MUST BE LINKED TO A JIRA TICKET>

Description of Change

<MENTION A SHORT DESCRIPTION OF YOUR CHANGES HERE>

Testing

<MENTION ABOUT YOUR TESTING DETAILS HERE, ATTACH SCREENSHOTS IF NEEDED (WITHOUT PII)>

Checklist

Some tips and links to help validate your PR:

Tested the connector with fivetran debug command.
Added/Updated example specific README.md file, refer here for template.
Followed Python Coding Standards, refer here

github-actions · 2025-10-31T09:59:14Z

🧹 Python Code Quality Check

⚠️ Flake8 has detected issues, please fix the issues before merging:

📎 Download full report from workflow artifacts.

📌 Only Python files changed in this PR were checked.

🔍 See how this check works

This comment is auto-updated with every commit.

Copilot

Pull Request Overview

This PR adds a new connector for Weights & Biases (W&B) that synchronizes experiment run metadata and metric time-series data. The connector implements incremental synchronization using timestamp-based watermarking, pagination for large datasets, and robust error handling with exponential backoff.

Key changes:

New W&B connector with API integration for runs and metrics endpoints
Implements incremental sync using updatedAt timestamp tracking
Includes retry logic with exponential backoff for transient errors

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.

File	Description
connectors/weights_and_biases/connector.py	Main connector implementation with update/schema functions, API client with retry logic, and incremental sync handling
connectors/weights_and_biases/configuration.json	Configuration template with API key, entity, and project parameters using proper placeholder format
connectors/weights_and_biases/README.md	Comprehensive documentation covering connector overview, features, authentication, data handling, and table schemas

Copilot · 2025-10-31T18:51:03Z

connectors/weights_and_biases/connector.py

+def validate_configuration(configuration: dict):
+    """Ensure required config keys are present."""
+    required = ["api_key", "entity", "project"]
+    for key in required:
+        if key not in configuration or not configuration[key]:
+            raise ValueError(f"Missing required configuration value: {key}")


The validate_configuration() function must be removed when a configuration.json file exists. The SDK automatically validates required fields from configuration.json, making this function redundant and violating SDK v2+ best practices.

Copilot · 2025-10-31T18:51:03Z

connectors/weights_and_biases/connector.py

+# The schema function takes one parameter:
+# - configuration: a dictionary that holds the configuration settings for the connector.
+def schema(configuration: dict):
+    """Define schema tables."""


The schema function docstring must use the exact required format. Replace with: 'Define the schema function which lets you configure the schema your connector delivers. See the technical reference documentation for more details on the schema function: https://fivetran.com/docs/connectors/connector-sdk/technical-reference#schema'

Suggested change

"""Define schema tables."""

"""

Define the schema function which lets you configure the schema your connector delivers.

See the technical reference documentation for more details on the schema function:

https://fivetran.com/docs/connectors/connector-sdk/technical-reference#schema

Args:

configuration: a dictionary that holds the configuration settings for the connector.

"""

Copilot · 2025-10-31T18:51:03Z

connectors/weights_and_biases/connector.py

+# - configuration: a dictionary that contains any secrets or payloads you configure when deploying the connector
+# - state: a dictionary that contains whatever state you have chosen to checkpoint during the prior sync.
+# The state dictionary is empty for the first sync or for any full re-sync.
+def update(configuration: dict, state: dict):


The update function is missing the required docstring. It must include: 'Define the update function which lets you configure how your connector fetches data. See the technical reference documentation for more details on the update function: https://fivetran.com/docs/connectors/connector-sdk/technical-reference#update' with Args section for configuration and state parameters.

Suggested change

def update(configuration: dict, state: dict):

def update(configuration: dict, state: dict):

"""

Define the update function which lets you configure how your connector fetches data.

See the technical reference documentation for more details on the update function:

https://fivetran.com/docs/connectors/connector-sdk/technical-reference#update

Args:

configuration: a dictionary that holds the configuration settings for the connector.

state: a dictionary that holds the state of the connector.

"""

Copilot · 2025-10-31T18:51:03Z

connectors/weights_and_biases/connector.py

+def update(configuration: dict, state: dict):
+    log.info("Starting Weights & Biases connector sync")
+
+    validate_configuration(configuration)


Remove this call to validate_configuration(). The SDK handles configuration validation automatically when configuration.json exists.

Copilot · 2025-10-31T18:51:03Z

connectors/weights_and_biases/connector.py

+    api_key = configuration["api_key"]
+    entity = configuration["entity"]
+    project = configuration["project"]
+    page_size = 100


Convert this to a module-level constant with the proper naming convention: __PAGE_SIZE = 100 (placed after imports with other constants). Remove the variable assignment inside the update function and reference the constant instead.

Copilot · 2025-10-31T18:51:05Z

connectors/weights_and_biases/connector.py

+    # Open the configuration.json file and load its contents into a dictionary.
+    with open("configuration.json", "r") as f:
+        configuration = json.load(f)
+    # Adding this code to your `connector.py` allows you to test your connector by running your file directly from your IDE.


The main block comments are incorrect. Line 211 should be removed (comment is redundant), line 214 should be removed, and line 215 should be preceded by: '# Test the connector locally'

Suggested change

# Open the configuration.json file and load its contents into a dictionary.

with open("configuration.json", "r") as f:

configuration = json.load(f)

# Adding this code to your `connector.py` allows you to test your connector by running your file directly from your IDE.

with open("configuration.json", "r") as f:

configuration = json.load(f)

# Test the connector locally

Copilot · 2025-10-31T18:51:05Z

connectors/weights_and_biases/README.md

+| `api_key` | Required. Your W&B API key for authentication. |
+| `entity` | Required. The W&B user or organization name. |
+| `project` | Required. The project name in which your runs are logged. |
+| `page_size` | Optional. The number of records to retrieve per request (default: 100). |


The page_size parameter is documented in the README but does not exist in configuration.json. Either add it to configuration.json or remove this row from the README. Configuration in the README must exactly match configuration.json.

Copilot · 2025-10-31T18:51:06Z

connectors/weights_and_biases/README.md

+## Additional considerations
+- API responses may vary based on your account type and workspace access.
+- Incremental syncs help minimize API calls for large projects.
+- If your project contains thousands of runs, increase `page_size` to optimize sync performance.
+- This connector is designed for **educational and demonstration purposes** using the **Fivetran Connector SDK**.
+
+For questions or feedback, please contact **Fivetran Support**.


The Additional considerations section must contain the exact required disclaimer. Replace the entire section content with: 'The examples provided are intended to help you effectively use Fivetran's Connector SDK. While we've tested the code, Fivetran cannot be held responsible for any unexpected or negative consequences that may arise from using these examples. For inquiries, please reach out to our Support team.'

Copilot · 2025-10-31T18:51:06Z

connectors/weights_and_biases/README.md

+- API responses may vary based on your account type and workspace access.
+- Incremental syncs help minimize API calls for large projects.
+- If your project contains thousands of runs, increase `page_size` to optimize sync performance.
+- This connector is designed for **educational and demonstration purposes** using the **Fivetran Connector SDK**.


Remove bold text formatting. Bold should only be used for UI element names (tabs, menus, buttons, fields). Change to plain text: 'This connector is designed for educational and demonstration purposes using the Fivetran Connector SDK.'

Copilot · 2025-10-31T18:51:06Z

connectors/weights_and_biases/connector.py

+    return dt.datetime.utcnow().replace(tzinfo=dt.timezone.utc).isoformat()
+
+
+def make_api_request(url: str, headers: dict, params: Optional[dict] = None) -> dict:


Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

fivetran-chinmayichandrasekar

@fivetran-surabhisingh Left a few suggestions. The main Readme.md file is missing.

fivetran-chinmayichandrasekar · 2025-11-04T12:18:21Z

connectors/weights_and_biases/README.md

+This connector demonstrates how to fetch experiment and metric data from [Weights & Biases (W&B)](https://wandb.ai/) and upsert it into your destination using the **Fivetran Connector SDK**.  
+It synchronizes **run metadata** and **metric time-series data** from W&B projects, supports incremental synchronization via timestamps, and includes comprehensive retry and error handling logic for large-scale ML experiment tracking.


Suggested change

This connector demonstrates how to fetch experiment and metric data from [Weights & Biases (W&B)](https://wandb.ai/) and upsert it into your destination using the **Fivetran Connector SDK**.

It synchronizes **run metadata** and **metric time-series data** from W&B projects, supports incremental synchronization via timestamps, and includes comprehensive retry and error handling logic for large-scale ML experiment tracking.

This connector demonstrates how to fetch experiment and metric data from [Weights & Biases (W&B)](https://wandb.ai/) and upsert it into your destination using the Fivetran Connector SDK.

It synchronizes run metadata and metric time-series data from W&B projects, supports incremental synchronization via timestamps, and includes comprehensive retry and error handling logic for large-scale ML experiment tracking.

fivetran-chinmayichandrasekar · 2025-11-04T12:19:41Z

connectors/weights_and_biases/README.md

+
+
+## Authentication
+The connector uses **Bearer Token authentication** to connect securely to the W&B API.


Suggested change

The connector uses **Bearer Token authentication** to connect securely to the W&B API.

The connector uses Bearer Token authentication to connect securely to the W&B API.

fivetran-chinmayichandrasekar · 2025-11-04T12:21:02Z

connectors/weights_and_biases/README.md

+## Data handling
+The connector performs the following operations:
+
+1. **Runs Table**
+    - Fetches experiment metadata such as:
+        - Run ID
+        - User
+        - State
+        - Creation and update timestamps
+        - Tags, config, and summary metrics
+    - Converts nested objects (`tags`, `config`, `summaryMetrics`) into JSON strings for storage.
+    - Performs incremental filtering using the `updatedAt` field to avoid re-fetching unchanged runs.
+
+2. **Metrics Table**
+    - For each run, retrieves metric history including:
+        - Step number
+        - Metric name
+        - Metric value
+        - Timestamp
+    - Each metric record is upserted to maintain a complete time-series view.
+
+3. **Incremental Sync**
+    - Tracks the latest synchronization timestamp (`runs_hwm_utc`) to fetch only new or updated runs during subsequent syncs.
+    - Uses checkpointing to store this state.
+
+4. **Checkpointing**
+    - Saves synchronization progress after each run:
+      ```python
+      op.checkpoint({"runs_hwm_utc": current_hwm})
+      ```
+
+5. **Upserts**
+    - Inserts or updates data in Fivetran using:
+      ```python
+      op.upsert("runs", run_record)


Suggested change

## Data handling

The connector performs the following operations:

1. **Runs Table**

- Fetches experiment metadata such as:

- Run ID

- User

- State

- Creation and update timestamps

- Tags, config, and summary metrics

- Converts nested objects (`tags`, `config`, `summaryMetrics`) into JSON strings for storage.

- Performs incremental filtering using the `updatedAt` field to avoid re-fetching unchanged runs.

2. **Metrics Table**

- For each run, retrieves metric history including:

- Step number

- Metric name

- Metric value

- Timestamp

- Each metric record is upserted to maintain a complete time-series view.

3. **Incremental Sync**

- Tracks the latest synchronization timestamp (`runs_hwm_utc`) to fetch only new or updated runs during subsequent syncs.

- Uses checkpointing to store this state.

4. **Checkpointing**

- Saves synchronization progress after each run:

```python

op.checkpoint({"runs_hwm_utc": current_hwm})

```

5. **Upserts**

- Inserts or updates data in Fivetran using:

```python

op.upsert("runs", run_record)

## Data handling

The connector performs the following operations:

- **Runs Table**

- Fetches experiment metadata such as:

- Run ID

- User

- State

- Creation and update timestamps

- Tags, config, and summary metrics

- Converts nested objects (`tags`, `config`, `summaryMetrics`) into JSON strings for storage.

- Performs incremental filtering using the `updatedAt` field to avoid re-fetching unchanged runs.

- **Metrics Table**

- For each run, retrieves metric history including:

- Step number

- Metric name

- Metric value

- Timestamp

- Each metric record is upserted to maintain a complete time-series view.

- **Incremental Sync**

- Tracks the latest synchronization timestamp (`runs_hwm_utc`) to fetch only new or updated runs during subsequent syncs.

- Uses checkpointing to store this state.

- **Checkpointing**

- Saves synchronization progress after each run:

```python

op.checkpoint({"runs_hwm_utc": current_hwm})

```

- **Upserts**

- Inserts or updates data in Fivetran using:

```python

op.upsert("runs", run_record)

fivetran-rishabhghosh

Please address copilot comments

fivetran-sahilkhirwal

Please check for failing checks and existing review comments and re-request the review :)

cla-assistant · 2026-01-02T08:35:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Surabhi Singh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

add Weights and Biases connector example

37a5a04

fivetran-surabhisingh self-assigned this Oct 31, 2025

fivetran-surabhisingh requested review from a team as code owners October 31, 2025 09:58

fivetran-surabhisingh added the hackathon For all the PRs related to the internal Fivetran 2025 Connector SDK Hackathon. label Oct 31, 2025

github-actions bot added the size/L PR size: Large label Oct 31, 2025

fivetran-dejantucakov requested a review from fivetran-chinmayichandrasekar October 31, 2025 14:03

fivetran-dejantucakov assigned fivetran-chinmayichandrasekar Oct 31, 2025

fivetran-sahilkhirwal requested review from Copilot, fivetran-ameysharma and fivetran-sahilkhirwal October 31, 2025 18:48

Copilot AI reviewed Oct 31, 2025

View reviewed changes

fivetran-chinmayichandrasekar reviewed Nov 4, 2025

View reviewed changes

fivetran-rishabhghosh reviewed Nov 20, 2025

View reviewed changes

fivetran-monicadholwani added the top-priority A top priority PR for review label Dec 16, 2025

fivetran-sahilkhirwal requested changes Dec 19, 2025

View reviewed changes

fivetran deleted a comment from cla-assistant bot Jan 2, 2026

-    """Define schema tables."""
+    """
+    Define the schema function which lets you configure the schema your connector delivers.
+    See the technical reference documentation for more details on the schema function:
+    https://fivetran.com/docs/connectors/connector-sdk/technical-reference#schema
+    Args:
+        configuration: a dictionary that holds the configuration settings for the connector.
+    """

-def update(configuration: dict, state: dict):
+def update(configuration: dict, state: dict):
+    """
+    Define the update function which lets you configure how your connector fetches data.
+    See the technical reference documentation for more details on the update function:
+    https://fivetran.com/docs/connectors/connector-sdk/technical-reference#update
+    Args:
+        configuration: a dictionary that holds the configuration settings for the connector.
+        state: a dictionary that holds the state of the connector.
+    """

		return dt.datetime.utcnow().replace(tzinfo=dt.timezone.utc).isoformat()


		def make_api_request(url: str, headers: dict, params: Optional[dict] = None) -> dict:

		This connector demonstrates how to fetch experiment and metric data from [Weights & Biases (W&B)](https://wandb.ai/) and upsert it into your destination using the Fivetran Connector SDK.
		It synchronizes run metadata and metric time-series data from W&B projects, supports incremental synchronization via timestamps, and includes comprehensive retry and error handling logic for large-scale ML experiment tracking.



		## Authentication
		The connector uses Bearer Token authentication to connect securely to the W&B API.

	The connector uses Bearer Token authentication to connect securely to the W&B API.
	The connector uses Bearer Token authentication to connect securely to the W&B API.

Conversation

fivetran-surabhisingh commented Oct 31, 2025

Jira ticket

Description of Change

Testing

Checklist

Uh oh!

github-actions bot commented Oct 31, 2025

🧹 Python Code Quality Check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

fivetran-chinmayichandrasekar left a comment

Choose a reason for hiding this comment

Uh oh!

fivetran-chinmayichandrasekar Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fivetran-chinmayichandrasekar Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fivetran-chinmayichandrasekar Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fivetran-rishabhghosh left a comment

Choose a reason for hiding this comment

Uh oh!

fivetran-sahilkhirwal left a comment

Choose a reason for hiding this comment

Uh oh!

cla-assistant bot commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants