Skip to content

feat(service): add DataRudder provider with safehttp-based ConnectivityProber#37

Open
dy-shimizu wants to merge 8 commits into
feature/oauth2-token-endpoint-authfrom
feature/datarudder-provider
Open

feat(service): add DataRudder provider with safehttp-based ConnectivityProber#37
dy-shimizu wants to merge 8 commits into
feature/oauth2-token-endpoint-authfrom
feature/datarudder-provider

Conversation

@dy-shimizu
Copy link
Copy Markdown
Collaborator

Summary

Adds the DataRudder provider (Delorean fraud-detection API) with:

  • datarudder.create-transaction executor (POST /api/v1/fraud/transactions/)
  • ConnectivityProber implementation using safehttp.Validate + original URL (no PinnedURL substitution), enabling correct TLS SNI against DNS-only-SAN certs
  • OAuth2 auth via the oauth2_token_endpoint provider from feat(service): add oauth2_token_endpoint outbound auth provider #36

⚠️ Depends on #36 (the parent oauth2 auth provider). Base branch is feature/oauth2-token-endpoint-auth. GitHub will auto-rebase onto develop once #36 merges.

What's added

Layer File Change
Provider pkg/executors/datarudder/provider.go NEW — wraps *base.Provider; implements Provider, InputBuilder, ConnectivityProber
Executor pkg/executors/datarudder/create_transaction.go NEW — POST /api/v1/fraud/transactions/
Input builder pkg/executors/datarudder/input_builder.go NEW — translates provider config + execution input into runner shape; hardcodes credentials_location: body per Delorean contract
Connectivity prober pkg/executors/datarudder/connectivity_prober.go NEW — list-endpoint probe (?limit=1&offset=0) with safehttp.Validate pre-flight
Test infra pkg/executors/datarudder/main_test.go NEW — mirrors Midaz pattern, flips safehttp.SetAllowPrivateForTest for the suite
Tests *_test.go NEW — schema + executor + input builder + prober (16 prober test cases including DNS-only-SAN regression)
Registration pkg/executors/register.go, pkg/executors/http/http.go Added provider catalog entry
Planning docs/pre-dev/datarudder/tasks.md NEW — implementation notes

Key design decisions

1. Probe via list endpoint, not sentinel UUID

Probes GET {base_url}/api/v1/fraud/transactions/?limit=1&offset=0 which returns 200 natural when auth + endpoint reachable. This:

  • Validates OAuth2 token has read scope (stronger signal than authenticating then 404'ing on a sentinel)
  • Avoids the 404→200 remap that would otherwise be needed to fit the shared command's outcome mapping
  • Aligns with the connectivity-prober guide's pagination + read-only pattern (?limit=1)
  • Smoke-tested against https://inference.api-stg.delorean-ai.com: 200, ~570-980 ms wall time, all 3 stages green

2. SSRF via safehttp.Validate + original URL (not libSSRF + PinnedURL)

Earlier iterations used libSSRF.ResolveAndValidate + PinnedURL + req.Host = Authority (the pattern Midaz and Tracer probers still use). That pattern silently breaks HTTPS against certs with only DNS SANs because Go derives TLS SNI from req.URL.Host — substituting a pinned IP literal there fails handshake with cannot validate certificate for <IP> because it doesn't contain any IP SANs.

This was reproduced against Delorean staging (*.delorean-ai.com cert covers DNS SANs only). Initial workaround was a withPinnedSNI() helper cloning the Transport and setting TLSClientConfig.ServerName. That helper is gone now.

Final approach uses safehttp.Validate(ctx, baseURL, nil) for pre-flight SSRF check, then dispatches the request against the original URL (hostname intact). The command-supplied httpClient is already wrapped by safehttp.NewClient at the command layer, so dial-time IP enforcement is automatic. TLS SNI works correctly without any extra code.

Datarudder is the first connectivity prober in the codebase to adopt this pattern. Tracer and Midaz still use the legacy libSSRF + PinnedURL approach (they pass tests only because httptest.NewTLSServer emits IP SANs that mask the bug). See "Findings reported separately" below.

3. credentials_location hardcoded to body (not in schema)

The Delorean /api/v1/auth/login endpoint only accepts form-body credentials. Surfacing credentials_location as a user-configurable field would be misleading. It's set internally in input_builder.go:buildDataRudderAuth, schema omits it (validated by additionalProperties:false at both schema levels).

4. applyDataRudderAuth returns Reached=false on token failure

If the OAuth2 token fetch fails (token endpoint 401, network error, etc.), the probe URL was never hit, so Reached=false. Diverges from Midaz semantics (which returns Reached=true if the token host was reachable). Documented and tested.

Test plan

  • make test-unit — PASS
  • go test -tags=unit ./pkg/executors/datarudder/... — PASS (39 tests/subtests including the 16-case prober suite)
  • make test-integration — PASS
  • make test-e2e — PASS
  • gofmt, go vet, golangci-lint — clean for pkg/executors/datarudder/
  • Real-world smoke test against https://inference.api-stg.delorean-ai.com: overallStatus=passed, ~570-980 ms wall time, stages[2].details.url ends with ?limit=1&offset=0, all 3 stages green. Re-tested after upstream codereview fixes — still green.
  • DNS-only-SAN regression test (TestSafehttp_HandshakeAgainstDNSOnlyCert) — exercises the safehttp wrap against a self-signed cert with DNS SAN only, no IP SAN. Locks in that future refactors don't reintroduce the bug.

Notes for reviewers

  • This PR demonstrates the canonical pattern for HTTPS-correct connectivity probers using pkg/safehttp (PR feat(infrastructure): defense-in-depth SSRF protection for outbound HTTP #24). If approved, the same migration can be applied to Midaz/Tracer probers in a follow-up.
  • The withPinnedSNI helper was developed, tested, then deleted — git history preserves the journey. Commit 442a0a6 adds it; commit e45e7d7 deletes it as part of the safehttp migration.
  • All test scenarios from the original pre-dev plan are still exercised (different mechanism, same coverage matrix).

Findings reported separately

During integration, several observations outside this PR's scope were communicated to @alexgarzao:

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • develop

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7ccef43-c5a2-46cc-88b1-5a93e3ad44d9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

shimizu-lerian and others added 8 commits May 22, 2026 18:07
Captures the T-001 task with ST-001-NN subtasks for the DataRudder
provider implementation: scaffolding, BuildInput + auth translation,
ConnectivityProber with sentinel UUID probe, and tests. References
the approved plan at C:\Users\SHMZ\.claude\plans\giggly-waddling-wind.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vityProber

Adds a public DataRudder provider to the catalog that integrates with the
Delorean anti-fraud API. Mirrors the Midaz architectural pattern (struct
wrapping base.Provider with InputBuilder + ConnectivityProber methods)
adapted for OAuth2 token-endpoint auth instead of OIDC.

Components:

- provider.go: Register() + NESTED providerConfigSchema with
  additionalProperties:false to reject typos (e.g., token_uri instead
  of token_url). User-facing display name "DataRudder", description
  references the Delorean product.
- create_transaction.go: datarudder.create-transaction executor with
  a loose body schema (Delorean validates upstream).
- input_builder.go: executorRoutes map (Midaz style) + BuildInput()
  that normalises trailing slash via strings.TrimRight and delegates
  auth construction. buildDataRudderAuth extracts the 3 nested auth
  fields explicitly and hardcodes credentials_location="body" (the
  field is intentionally not exposed in the provider schema since
  Delorean only accepts body credentials).
- connectivity_prober.go: Probe() against the GET fraud transactions
  endpoint with a deterministic sentinel UUID. Maps 200/404 to
  Reached+AuthApplied, 401/403 to Reached without AuthApplied, 5xx
  to a transport error, and surfaces the upstream "detail" field in
  ProbeOutcome.Details["upstream_detail"]. SSRF pinning via
  libSSRF.ResolveAndValidate with lazy env var lookup matching the
  Midaz pattern.
- ssrf_test_helpers.go: //go:build unit helper to reset the SSRF
  cache between test scenarios, mirroring midaz/ssrf_test_helpers.go.

Path /api/v1/fraud/transactions/ is hardcoded as a security measure
(capability limiting). Only base_url is user-configurable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires datarudder.Register into RegisterDefaults so end users can
discover the DataRudder provider via GET /v1/catalog/providers and
create ProviderConfigurations against it. Bumps the catalog provider
count assertion from 2 to 3 to reflect the new registration.

Without this, the provider package compiles but is invisible to the
public catalog API — the entire point of the feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the datarudder package with 89.8% statement coverage.

- input_builder_test.go: BuildInput happy path, trailing slash
  normalization (single + multiple slashes), unknown executor ID,
  missing/empty base_url, auth block omission, body parsing fallback
  for non-JSON, nodeData passthrough.
- connectivity_prober_test.go: 13 test cases covering all outcome
  paths from the approved plan — sentinel 404 happy path, 200
  transaction-exists, 401/403 auth rejection, 5xx server error,
  token endpoint failure, network unreachable, SSRF blocked,
  missing base_url, missing auth block, missing auth field,
  upstream_detail surfacing, non-JSON body resilience, plus
  AuthRequired() = true.
- datarudder_test.go: catalog registration sanity, JSON Schema
  acceptance/rejection (extra top-level field, extra auth field,
  each required field missing), and an intra-package integration
  test that drives the full chain — BuildInput -> runner.Execute
  -> mock token endpoint -> mock fraud endpoint — asserting Bearer
  token injection, body integrity, no grant_type sent, and
  response surfacing.

Mock servers run on httptest.Server (127.0.0.1) so tests rely on
SSRF_ALLOW_PRIVATE=true + the unit-tagged ssrf_test_helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous assertions used strings.Contains against the raw form body,
which would have produced false positives for values like "the-client-foo"
matching the expected "the-client". Parse the body via url.ParseQuery and
assert on exact decoded values instead, so the test cannot be fooled by
substring overlap and any future regression in form encoding is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tics, test gaps

Addresses 1 Critical + 4 High + 5 Medium + 3 Low from the gate-8 multi-agent
review, scoped strictly to the DataRudder package (issues outside this
package — runner-wide token cache, masking, README drift — are left for the
Flowker product team).

Probe contract changes (connectivity_prober.go):

- CRITICAL: the sentinel UUID 404 (documented success signal) was being
  misclassified as E2E=Failed by the shared applyProbeOutcome command (which
  was written for Midaz semantics). Probe now rewrites StatusCode=404 to 200
  before returning, and stores the real status in Details["raw_status_code"]
  for diagnostics. Keeps the shared command mapping intact.

- HIGH: removed the post-response switch that overrode AuthApplied=false on
  401/403 and added a synthetic TransportErr on 5xx. Status classification is
  the shared command layer's job; owning that decision in the prober drifts
  from the Midaz pattern and double-classifies the outcome.

- MEDIUM: when applyDataRudderAuth fails, Reached is now false (the probe URL
  was never reached). The previous "we reached far enough to attempt auth"
  comment contradicted the documented Reached contract on
  pkg/executor/connectivity_prober.go:46.

- applyDataRudderAuth now returns just error (the bool was redundant — true
  on success, false on error). Simplifies the call site and removes one
  dead state variable.

- LOW: response body is drained via io.Copy(io.Discard, ...) before Close so
  keep-alive connection reuse is not blocked by unread bytes past the 4 KiB
  readUpstreamDetail cap.

Test improvements (connectivity_prober_test.go):

- HIGH: TestProbe_NetworkError_UnreachableHost was misnamed — config used the
  same unreachable host for both base_url and token_url, so auth failed FIRST
  and the post-auth Do() error branch was never covered. Renamed to
  TestProbe_NetworkError_UnreachableProbeHost and the config now stands up a
  working token mock + a refused base_url. Now genuinely exercises the
  intended branch.

- MEDIUM: TestProbe_MissingAuthField was renamed in-place and converted to
  table-driven, covering all three required auth fields (token_url,
  client_id, client_secret). The previous shape only exercised the
  client_secret branch because the extractor iterates keys in declaration
  order.

- MEDIUM: added TestReadUpstreamDetail with direct unit coverage of the
  helper — valid JSON with/without detail key, empty body, empty object,
  malformed JSON, and a body larger than probeBodyReadLimit (verifies the
  LimitReader truncation does not panic).

- MEDIUM: added TestProbe_Timeout exercising the client-timeout / context-
  deadline path against a deliberately slow fraud-endpoint mock. The
  10-second probeTimeout constant is no longer fully untested.

- MEDIUM: URL assertions now use Equal against the deterministic expected
  URL instead of Contains.

- LOW: removed the no-op '_ = ssrfOptions()' re-prime line in
  resetSSRFCacheForTest (Probe itself primes via the same sync.Once on its
  next call, so the test-side prime was dead). Added a CAUTION comment
  warning future contributors against t.Parallel() use given the
  package-level SSRF cache mutation.

- Cascading assertion updates across the existing 401/403/5xx/token-fail
  tests to honour the new probe contract (AuthApplied=true on a server
  rejection, Reached=false when auth fails, no synthetic TransportErr from
  the prober itself).

Schema test coverage (datarudder_test.go):

- HIGH: TestProviderConfigSchema_RejectsMissingRequiredFields renamed to
  RejectsInvalidConfigs and expanded to a table-driven enumeration of every
  required field plus minLength:1 violations. Previously only base_url,
  auth, and auth.token_url missing were tested; client_id, client_secret,
  and the empty-string minLength constraint were uncovered. Added an
  inline note documenting that format:"uri" is annotation-only under
  Draft 2020-12 (gap is Flowker-team-owned because the compiler config is
  shared).

- HIGH: TestIntegration_CreateTransaction_EndToEnd now captures and asserts
  that the token endpoint received NO Authorization header. This proves
  credentials_location="body" is honoured and that a future regression
  sending both body credentials AND a Basic header would be caught.

Coverage rose from 89.8% to 91.6% on the datarudder package. lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The SSRF-pinning path replaces the hostname with an IP in PinnedURL, which
causes Go's TLS handshake to use the IP as the default ServerName and fail
cert verification with "cannot validate certificate for <IP> because it
doesn't contain any IP SANs". lib-commons explicitly documents SNIHostname
as the field to pass to TLSClientConfig.ServerName for this case.

Smoke test against https://inference.api-stg.delorean-ai.com failed 100% of
the time without this fix because Delorean's public cert covers
*.delorean-ai.com (DNS SANs only). With the fix the handshake validates
against the DNS SAN as expected.

Adds a regression test that reproduces the production scenario in-process —
a self-signed cert with a DNS SAN but no IP SAN, served via httptest TLS,
exercised via the IP-host URL form. The "without fix" subtest asserts the
exact IP-SAN error; the "with fix" subtest asserts a successful 200.

The same defect is latent in pkg/executors/midaz/connectivity_prober.go and
the legacy generic path at internal/services/command/test_provider_config_connectivity.go.
Out of scope for this PR; tracked as separate followup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + list endpoint

Phase 2A migration (smoke-tested against https://inference.api-stg.delorean-ai.com).
Net -161 lines via three concurrent simplifications:

1. SSRF: libSSRF.ResolveAndValidate + PinnedURL + req.Host=Authority
   replaced by safehttp.Validate(probeCtx, baseURL, nil). The original URL
   stays on the wire so TLS SNI works against DNS-only-SAN certs — the
   smoke-tested fix that withPinnedSNI was working around.

2. SNI workaround: withPinnedSNI helper deleted (45 lines) along with its
   4 unit tests. The DNS-only-SAN regression test is preserved as
   TestSafehttp_HandshakeAgainstDNSOnlyCert, reframed to validate the
   safehttp.NewClient contract instead.

3. Probe target: sentinel UUID + 404→200 remap replaced by list endpoint
   /api/v1/fraud/transactions/?limit=1&offset=0 which returns 200 natural.

Test setup: ssrf_test_helpers.go deleted; main_test.go added mirroring
pkg/executors/midaz/main_test.go for suite-wide safehttp policy override.

docs/pre-dev/datarudder/tasks.md updated to reflect the new design.

Smoke test (Delorean staging):
- overallStatus: passed (572 ms)
- stages[2].details.url = .../api/v1/fraud/transactions/?limit=1&offset=0
- all 3 stages (connectivity, authentication, end_to_end) green
@dy-shimizu dy-shimizu force-pushed the feature/datarudder-provider branch from e45e7d7 to d314c5b Compare May 22, 2026 21:07
@dy-shimizu dy-shimizu changed the title feature(datarudder): add DataRudder provider with safehttp-based ConnectivityProber feat(service): add DataRudder provider with safehttp-based ConnectivityProber May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants