Skip to content

feat(observer)!: require roark_integration_id for call attribution#12

Closed
munkhorgil wants to merge 3 commits into
mainfrom
feat-roark-integration-id
Closed

feat(observer)!: require roark_integration_id for call attribution#12
munkhorgil wants to merge 3 commits into
mainfrom
feat-roark-integration-id

Conversation

@munkhorgil

@munkhorgil munkhorgil commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a required roark_integration_id to RoarkObserver so app-roark-analytics can attribute each call to a specific Roark Pipecat integration (self-hosted vs Pipecat Cloud).

Customers copy the id from the Roark dashboard when they create a Pipecat integration and set it as ROARK_INTEGRATION_ID (env var self-hosted, pcc secrets set on Pipecat Cloud), or pass roark_integration_id= directly.

Warning

Breaking change. RoarkObserver now raises a ValueError at construction if no integration id is resolvable (neither roark_integration_id= nor ROARK_INTEGRATION_ID). Existing deployments must set ROARK_INTEGRATION_ID before upgrading.

Changes

  • observer.py — keyword-only roark_integration_id: str | None = None, resolved from os.environ["ROARK_INTEGRATION_ID"] when not passed (constructor arg wins). Raises ValueError if the resolved value is empty. Documented in the constructor docstring (incl. a Raises: section).
  • observer.py payload buildersroarkIntegrationId is now set unconditionally on both CallStartedPayload and CallEndedPayload (guaranteed present), and logged on the existing log.info lines.
  • _types.pyroarkIntegrationId: str on both payloads (comment updated to "required").
  • examples/bot.py + .env.exampleroark_integration_id=os.environ.get("ROARK_INTEGRATION_ID"); docs note it is required.
  • README.md — Configure + running-modes sections document ROARK_INTEGRATION_ID as required, with the startup-ValueError behavior.
  • CHANGELOG.mdAdded entry for the field + a BREAKING Changed note.
  • Version bumped to 0.1.5.
  • Tests — an autouse fixture supplies the id suite-wide; explicit tests assert it is present on both payloads (via constructor arg and via env var), that the constructor arg overrides the env var, and that construction raises when neither is set.

The observer stays transport-agnostic — no transport or deployment detection; the integration id is the only new field.

Testing

  • uv run pytest -q → 33 passed
  • uv run ruff check src tests examples → clean
  • uv run mypy src → no issues

🤖 Generated with Claude Code

@munkhorgil munkhorgil requested a review from a team as a code owner June 8, 2026 06:28
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @munkhorgil's task in 3m 5s —— View job


Code Review

  • Gather context and understand the request
  • Read CLAUDE.md / AGENTS.md for repo guidelines
  • Review changed files
    • src/pipecat_roark/observer.py
    • src/pipecat_roark/_types.py
    • src/pipecat_roark/_recorder.py (new file)
    • tests/test_observer.py
    • tests/test_recorder.py (new file)
    • examples/bot.py, .env.example, README.md, CHANGELOG.md
  • Analyze for code quality, security, performance, testing, documentation
  • Post comprehensive review summary

Code Review Summary

Category Score Notes
Code Quality 5/5 Clean, minimal, backward-compatible implementation
Security 5/5 No sensitive data in logs; standard env-var pattern
Performance 5/5 Trivial string field, zero overhead
Testing 5/5 All four key scenarios covered thoroughly
Documentation 4/5 One omission: roark_integration_id missing from the README config reference table
Overall 4.8/5

Key findings

🟡 README configuration reference table is incomplete

The ## Configuration reference table (README.md around line 338) lists every RoarkObserver constructor parameter except the new roark_integration_id. Anyone skimming the reference section won't see the new option. Suggest adding:

| `roark_integration_id` | `str \| None` | `None` | Roark integration this deployment belongs to. Copied from the Roark dashboard when you create a Pipecat integration. Defaults to `ROARK_INTEGRATION_ID` env var; constructor arg takes precedence. Omit to fall back to the project's self-hosted integration. |

Fix this →

🔵 "" vs None semantics (observer.py:230, inline comment posted)

roark_integration_id or os.environ.get(...) means an explicit roark_integration_id="" falls through to the env var — almost certainly the right behavior, but worth a one-line docstring note: "An empty string is treated the same as None." Not a required change; see inline comment for the strict-None-check alternative.

🔵 PR bundles two changes (recorder refactor + integration-id)

The PR description notes the InlineArmingAudioBufferProcessor recorder refactor is interwoven with the integration-id changes. The commit message only calls out the integration-id feature. This is low-risk (both target 0.1.5) but the recorder-refactor is substantial — _recorder.py is ~80 lines plus 142 lines of tests. Future bisects may need to know this. A follow-up changelog entry or squash note would help.


What's done well ✅

  • Backward-compatible by design: the field is omitted from the wire payload (not sent as null) when unset — clean.
  • Both payload sites updated: call-started and call-ended both carry roarkIntegrationId consistently; the test suite verifies both.
  • Constructor arg wins over env var: correctly implements the "explicit > implicit" precedence with a clear fallback chain.
  • Recorder refactor: InlineArmingAudioBufferProcessor is a minimal, well-documented subclass; the module docstring explaining why the wall-clock approach was broken is excellent. The test_no_wall_clock_state_is_introduced regression guard is a nice touch.
  • Test quality: monkeypatch.delenv(..., raising=False) guard in test_roark_integration_id_absent_when_unset handles the case where the var is not set in CI — good defensive testing.
  • Logging: roarkIntegrationId appears in both the call-started and call-ended log lines alongside the other call-level fields — easy to grep in production.

Comment thread src/pipecat_roark/observer.py Outdated
self._agent_name = agent_name
self._agent_prompt = agent_prompt
self._pipecat_call_id = pipecat_call_id or str(uuid.uuid4())
self._roark_integration_id = roark_integration_id or os.environ.get("ROARK_INTEGRATION_ID")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The or operator means passing roark_integration_id="" (empty string) silently falls through to the env var rather than disabling the feature. This is almost certainly the right behavior (an empty-string integration ID is meaningless), but it differs from the semantics of a strict None-check. For transparency, the docstring could add a note like: "An empty string is treated the same as None."

If you ever wanted explicit None-only semantics:

self._roark_integration_id = (
    roark_integration_id if roark_integration_id is not None
    else os.environ.get("ROARK_INTEGRATION_ID")
)

As-is, the or pattern is idiomatic and consistent with agentName/agentPrompt elsewhere — flagging for awareness only.

Adds a keyword-only `roark_integration_id` to `RoarkObserver`, defaulting from
the `ROARK_INTEGRATION_ID` env var (self-hosted) / `pcc secrets` (Pipecat
Cloud); the constructor arg wins when both are present. When set it ships as
`roarkIntegrationId` on both the `call-started` and `call-ended` webhook bodies
so app-roark-analytics can attribute each call to a specific Roark Pipecat
integration. Fully backward compatible — when unset the field is omitted and the
server falls back to the project's self-hosted integration. The observer stays
transport-agnostic; the id is the only new field.

Bumps version to 0.1.5 and documents ROARK_INTEGRATION_ID in the README setup /
running-modes sections, .env.example, and the example bot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Roark integration id is now mandatory: RoarkObserver raises a ValueError at
construction if neither roark_integration_id= nor the ROARK_INTEGRATION_ID
environment variable supplies one. roarkIntegrationId is consequently sent
unconditionally on both the call-started and call-ended webhook bodies (no
longer omitted), so every call is attributed to a specific Roark Pipecat
integration rather than falling back to the project's self-hosted integration.

BREAKING CHANGE: existing deployments must set ROARK_INTEGRATION_ID
(self-hosted env var / pcc secrets on Pipecat Cloud) before upgrading.

Docs (README, .env.example, example bot) and the CHANGELOG updated to describe
the requirement; tests now supply the id suite-wide via an autouse fixture and
assert construction raises when it is absent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@munkhorgil munkhorgil changed the title feat(observer): add optional roark_integration_id for call attribution feat(observer)!: require roark_integration_id for call attribution Jun 8, 2026
@munkhorgil

Copy link
Copy Markdown
Contributor Author

@claude Review again

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @munkhorgil's task in 2m 4s —— View job


Code Review

  • Read CLAUDE.md / AGENTS.md for repo guidelines
  • Review changed files via git diff origin/main...HEAD
    • src/pipecat_roark/observer.py
    • src/pipecat_roark/_types.py
    • tests/test_observer.py
    • examples/bot.py, .env.example, README.md, CHANGELOG.md
  • Analyze code quality, security, testing, documentation
  • Post comprehensive review

Code Review Summary

Category Score Notes
Code Quality 5/5 Clean, minimal implementation
Security 5/5 No sensitive data in logs; standard env-var pattern
Performance 5/5 Trivial string field, zero overhead
Testing 5/5 All four key scenarios covered
Documentation 3/5 Three omissions (see below)
Overall 4.6/5

Findings

🔴 README Configuration reference table still missing roark_integration_id

The previous review flagged this and it is still not fixed. The table (README.md lines 332–340) lists every RoarkObserver constructor parameter except the new one — anyone skimming the reference section after upgrading will not see it. Suggest inserting after the pipecat_call_id row:

| `roark_integration_id` | `str \| None` | `None` | **Required.** Roark integration this deployment belongs to. Copy from the Roark dashboard when you create a Pipecat integration. Defaults to `ROARK_INTEGRATION_ID` env var; constructor arg takes precedence. Raises `ValueError` at startup if neither is set. |

Fix this →

🟡 CHANGELOG [Unreleased] comparison link is stale

CHANGELOG.md line 165:

[Unreleased]: https://github.com/roarkhq/sdk-roark-analytics-python-pipecat/compare/v0.1.3...HEAD

The last released entry is ## [0.1.4], so this should be v0.1.4...HEAD. As-is, clicking the link shows both 0.1.4 and Unreleased changes combined — not just the unreleased ones. A minor documentation bug but confusing for contributors.

Fix this →

🟡 examples/bot.py module docstring omits ROARK_INTEGRATION_ID

The module docstring (lines 22–35) lists env vars with a Roark: section that only mentions ROARK_API_KEY. ROARK_INTEGRATION_ID is now equally required (the os.environ.get("ROARK_INTEGRATION_ID") call is on line 113) but absent from the docstring env var list. Someone running the example for the first time and reading the docstring will miss it.

Fix this →

🔵 "" vs None semantics (carry-over from previous review)

roark_integration_id or os.environ.get(...) (observer.py:242) means passing roark_integration_id="" silently falls through to the env var. This is the right behavior (an empty-string ID is meaningless) and consistent with agentName/agentPrompt elsewhere — the docstring note "An empty string is treated the same as None" would make this explicit. Informational only, not blocking.


What's done well ✅

  • ValueError message is production-quality — specific, actionable, tells the user exactly where to get the value.
  • Both payload sites updated consistentlycall-started and call-ended both carry roarkIntegrationId unconditionally; both log lines include it too.
  • Constructor arg wins over env var — explicit > implicit, correctly implemented.
  • Test suite is solid — autouse fixture keeps noise out of individual tests; four scenarios (constructor arg, env var, precedence, missing) all verified; raising=False guard handles clean CI environments correctly.
  • _types.py type annotationroarkIntegrationId: str (not str | None) on both payloads, accurately reflecting the runtime guarantee.
  • .env.example — clear comment, empty-value placeholder is the right pattern.
  • CHANGELOG BREAKING section — includes the migration instruction (ROARK_INTEGRATION_ID must be set before upgrading).

The implementation itself is correct and production-ready. The three documentation gaps above are the only outstanding issues; the README config table omission is the most visible one since it's the canonical reference for users upgrading.

…v docs

- README: add roark_integration_id row to the Configuration reference table.
- examples/bot.py: list ROARK_INTEGRATION_ID in the module docstring env vars
  (both the .env list and the pcc secrets example).
- CHANGELOG: fix the [Unreleased] compare link (v0.1.3 -> v0.1.4) and add the
  missing [0.1.4] link.
- observer.py: docstring note that an empty-string roark_integration_id is
  treated the same as None (falls through to the env var).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@munkhorgil

Copy link
Copy Markdown
Contributor Author

@claude Review again

@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown

Claude finished @munkhorgil's task in 2m 21s —— View job


Code Review

  • Gather context and understand the request
  • Read CLAUDE.md / AGENTS.md for repo guidelines
  • Review changed files via git diff main...HEAD
    • src/pipecat_roark/observer.py
    • src/pipecat_roark/_types.py
    • tests/test_observer.py
    • examples/bot.py, .env.example, README.md, CHANGELOG.md
  • Check if previous review findings were addressed
  • Post comprehensive review summary

Code Review Summary

Category Score Notes
Code Quality 5/5 Clean, minimal, idiomatic implementation
Security 5/5 No sensitive data in logs; standard env-var pattern
Performance 5/5 Trivial string field, zero overhead
Testing 5/5 All four key scenarios covered thoroughly
Documentation 5/5 All previous omissions now resolved
Overall 5/5 Ready to merge

Previous review findings — all resolved ✅

Finding Status
🔴 README Configuration reference table missing roark_integration_id Fixed — row added after pipecat_call_id
🟡 CHANGELOG [Unreleased] link pointed at v0.1.3 Fixed — updated to v0.1.4...HEAD; [0.1.4] link added
🟡 examples/bot.py docstring missing ROARK_INTEGRATION_ID Fixed — added under the Roark: env var section and in the pcc secrets set example
🔵 "" vs None semantics undocumented Fixed — docstring now explicitly states "an empty string is treated the same as None"

What's done well ✅

  • ValueError message is production-quality — specific, actionable, tells the user exactly where to get the value.
  • Both payload sites updated consistentlycall-started and call-ended both carry roarkIntegrationId unconditionally; both log lines include it too. Easy to grep in production.
  • Constructor arg wins over env var — explicit > implicit, correctly implemented with or fallback chain.
  • Test suite is solid — autouse fixture keeps noise out of individual tests; four scenarios (constructor arg, env var, precedence, missing) all verified; raising=False guard handles clean CI environments correctly.
  • _types.py type annotationroarkIntegrationId: str (not str | None) on both payloads, accurately reflecting the runtime guarantee.
  • .env.example — clear comment, empty-value placeholder is the right pattern.
  • CHANGELOG BREAKING section — includes the migration instruction (ROARK_INTEGRATION_ID must be set before upgrading).

Informational notes (no action needed)

  • uv.lock version jump (0.1.30.1.5): the lock file had a pre-existing version lag relative to pyproject.toml (0.1.4). The update to 0.1.5 brings them in sync — not a bug.
  • _types.py with total=False: roarkIntegrationId: str is technically optional in the TypedDict type, though the constructor guarantees it's always present at runtime. This is consistent with the existing pattern for other always-set fields — no change needed.
  • Explicit os.environ.get("ROARK_INTEGRATION_ID") in examples/bot.py: the observer would read the env var automatically anyway, but the explicit form is good documentation — makes the env-var connection visible to someone reading the example.

The implementation is correct, tests are comprehensive, and all documentation is complete. This PR is ready to merge.

@munkhorgil

Copy link
Copy Markdown
Contributor Author

We have decided to extend api key with integration id rather than pass it via package here.

@munkhorgil munkhorgil closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant