-
Notifications
You must be signed in to change notification settings - Fork 71
feat(cloud): Add CloudCredentials class and bearer token authentication support #916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on support This commit consolidates authentication handling for Airbyte Cloud API: - Add CloudCredentials dataclass in airbyte/cloud/credentials.py that encapsulates auth configuration (client_id/secret or bearer_token) - Add bearer token authentication support to api_util functions - Add resolve_cloud_credentials() in MCP _util.py for multi-source credential resolution (explicit params -> HTTP headers -> env vars) - Add HTTP header extraction helpers for MCP HTTP/SSE transport - Update CloudWorkspace to support bearer token authentication - Export CloudCredentials from airbyte.cloud module This replaces PRs #866, #867, and #914 with a unified implementation. Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1765930147-cloud-credentials' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1765930147-cloud-credentials'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughAdds bearer-token-first authentication across the SDK: introduces CloudClientConfig, resolves credentials from env/HTTP headers, enforces mutual-exclusivity with client credentials, and threads bearer_token through config and cloud API utilities and MCP helpers for authenticated requests. Changes
Sequence DiagramssequenceDiagram
participant Caller
participant CloudWorkspace
participant ResolveCreds as resolve_cloud_credentials
participant CloudCfg as CloudClientConfig
participant APIUtil as _make_config_api_request / API client
Caller->>CloudWorkspace: from_env()
CloudWorkspace->>ResolveCreds: read env/headers (bearer_token / client creds / api_root)
ResolveCreds->>CloudCfg: instantiate CloudClientConfig
CloudCfg-->>ResolveCreds: validated config (uses bearer_token or client creds)
ResolveCreds-->>CloudWorkspace: CloudClientConfig
Caller->>CloudWorkspace: perform API call (e.g., create_source)
CloudWorkspace->>APIUtil: call with bearer_token (or creds)
alt bearer_token present
APIUtil->>Cloud_API: request with Authorization: Bearer <token>
else uses client creds
APIUtil->>AuthServer: exchange client_id/secret -> bearer token
AuthServer-->>APIUtil: bearer token
APIUtil->>Cloud_API: request with Authorization: Bearer <token>
end
Cloud_API-->>APIUtil: response
APIUtil-->>CloudWorkspace: response
CloudWorkspace-->>Caller: result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus on during review:
Would you like me to run a quick checklist mapping which api_util public functions were updated to ensure none were missed? Wdyt? Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/cloud/connections.py (1)
57-65: Missingbearer_tokenpropagation in core API methodsThe
api_util.get_connection,api_util.run_connection, andapi_util.get_job_logsfunctions all supportbearer_tokenas an optional parameter, but_fetch_connection_info,run_sync, andget_previous_sync_logsdon't pass it along. Sinceget_connection_state,get_connection_catalog, andpatch_connectioncalls already includebearer_token=self.workspace.bearer_token, shouldn't these three methods do the same for consistency? If a user authenticates solely with a bearer token, these methods would fail, wdyt?
🧹 Nitpick comments (5)
airbyte/cloud/auth.py (1)
9-31: Minor: Potentially redundantSecretStringwrapping on line 30?
try_get_secretalready returnsSecretString | None, so wrapping the result again inSecretString(result)might be redundant. SinceSecretStringis astrsubclass, it's harmless, but you could simplify to justreturn result, wdyt?result = try_get_secret(constants.CLOUD_BEARER_TOKEN_ENV_VAR, default=None) if result: - return SecretString(result) + return result return NoneOtherwise, the design is clean - I like how the docstring explicitly calls out that this returns
Nonewhen not found (unlike the other resolve functions), making the optional nature of bearer tokens clear.airbyte/mcp/cloud_ops.py (1)
37-41: Consider updatingCLOUD_AUTH_TIP_TEXTto mention bearer tokens?The tip text currently only mentions
AIRBYTE_CLOUD_CLIENT_IDandAIRBYTE_CLOUD_CLIENT_SECRETbut doesn't mentionAIRBYTE_CLOUD_BEARER_TOKEN. Since bearer tokens are now supported, it might be helpful to update this constant to include that option, wdyt?airbyte/cloud/workspaces.py (1)
116-141: Nice initialization pattern, but_credentialsappears unused after construction?The
_credentialsfield is created and validated in__post_init__, which is good for ensuring mutual exclusivity. However, I notice the individualclient_id,client_secret, andbearer_tokenfields are still used directly throughout the class methods rather than accessing them viaself._credentials.Would it be worth considering using
self._credentialsconsistently in the API calls instead of the individual fields? This would ensure a single source of truth. Alternatively, if_credentialsis only for validation, maybe a comment clarifying that intent would help future readers, wdyt?airbyte/_util/api_util.py (2)
1289-1311: Good auth handling in_make_config_api_request.The logic prefers an explicit
bearer_tokenand falls back to generating one from client credentials. This is consistent with the approach elsewhere.However, I notice this function doesn't enforce mutual exclusivity—if someone passes both
bearer_tokenANDclient_id/client_secret, the bearer token silently wins. This differs fromget_airbyte_server_instancewhich raises an error for that case. Should this function also validate mutual exclusivity for consistency, or is the "bearer_token wins" behavior intentional here since it's an internal function, wdyt?
1760-1794: Consider adding bearer_token support or documenting the intentional limitation
list_organizations_for_userandlist_workspaces_in_organizationboth requireclient_idandclient_secretas mandatory parameters, unlike other functions in this file which support both client credentials and bearer_token authentication.Is this intentional (perhaps these endpoints have specific requirements), or were these functions missed during the update? If intentional, a comment explaining why would help future maintainers, wdyt?
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
airbyte/_util/api_util.py(70 hunks)airbyte/cloud/__init__.py(1 hunks)airbyte/cloud/auth.py(1 hunks)airbyte/cloud/connections.py(5 hunks)airbyte/cloud/connectors.py(3 hunks)airbyte/cloud/credentials.py(1 hunks)airbyte/cloud/workspaces.py(19 hunks)airbyte/constants.py(1 hunks)airbyte/mcp/_util.py(2 hunks)airbyte/mcp/cloud_ops.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte/cloud/auth.py (2)
airbyte/secrets/base.py (1)
SecretString(38-143)airbyte/secrets/util.py (1)
try_get_secret(33-60)
airbyte/cloud/__init__.py (2)
airbyte/cloud/credentials.py (1)
CloudCredentials(56-188)airbyte/cloud/workspaces.py (1)
CloudWorkspace(88-886)
airbyte/cloud/workspaces.py (4)
airbyte/cloud/auth.py (5)
resolve_cloud_bearer_token(9-31)resolve_cloud_api_url(50-58)resolve_cloud_workspace_id(61-66)resolve_cloud_client_id(42-47)resolve_cloud_client_secret(34-39)airbyte/cloud/credentials.py (1)
CloudCredentials(56-188)tests/integration_tests/cloud/conftest.py (1)
workspace_id(49-50)airbyte/secrets/base.py (1)
SecretString(38-143)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (24)
airbyte/cloud/connections.py (1)
296-305: Bearer token propagation looks good here.The
bearer_tokenis correctly sourced fromself.workspace.bearer_tokenand passed to the API utility. This aligns with the new authentication flow.airbyte/constants.py (1)
225-231: LGTM!The new constant follows the existing naming conventions and the docstring clearly explains when to use bearer token authentication versus client credentials. Nice addition!
airbyte/cloud/connectors.py (3)
149-178: Good addition ofbearer_tokentocheck()method.The bearer token is correctly propagated here. However, I notice the same inconsistency as in
connections.py- other methods in this file like_fetch_connector_info(lines 194-201 and 276-283),rename,update_config, anddeploy_sourcedon't passbearer_token. If a user authenticates solely with a bearer token, these methods may fail to authenticate, wdyt?
460-471: Bearer token propagation looks good forconnector_builder_project_id.Clean addition - the token flows through correctly from the workspace.
751-760: LGTM forset_testing_values.Bearer token correctly passed to the API utility call.
airbyte/cloud/__init__.py (1)
58-83: LGTM!Clean public API exposure -
CloudCredentialsis now properly accessible viaairbyte.cloud.CloudCredentials. The TYPE_CHECKING import for documentation and the__all__updates follow the existing patterns nicely.airbyte/mcp/cloud_ops.py (3)
218-238: Nice refactor of_get_cloud_workspace!The credential resolution is now cleanly delegated to
resolve_cloud_credentials(), supporting both HTTP headers and environment variables. The docstring updates clearly document the resolution order. This is a solid improvement!
517-526: Good consistency - using workspace's credentials.Nice that
check_airbyte_cloud_workspacenow pulls credentials from the resolved workspace object rather than calling individual resolve functions. This ensures consistent authentication behavior.
1392-1394: Organization-related functions still use direct client credential resolution.I notice that
list_cloud_workspaces,describe_cloud_organization, and the_resolve_organization*helper functions still directly callresolve_cloud_api_url(),resolve_cloud_client_id(), andresolve_cloud_client_secret()without using the newresolve_cloud_credentials()pattern. This means these functions won't work with bearer token authentication.Is this intentional (perhaps these endpoints require client credentials specifically), or should they be updated to use the unified credential resolution pattern like
_get_cloud_workspace, wdyt?airbyte/cloud/credentials.py (3)
55-129: Well-designed credential validation!The mutual exclusivity logic is clean and the error messages with guidance are very helpful. The order of checks makes sense:
- Both methods provided → error
- Incomplete client credentials → error
- Neither method provided → error
One minor edge case to consider:
SecretStringsubclassesstrand overrides__bool__to always returnTrue(per the relevant code snippet at line 81-85). If someone passesbearer_token="", it would be wrapped asSecretString(""), which is truthy but empty. The validation would pass, but API calls would fail later. This is probably an unlikely scenario, but wanted to mention it. Might be worth checkinglen(self.bearer_token) > 0in addition tois not None, wdyt?
140-188: Cleanfrom_env()factory method!The resolution order (bearer token first, then client credentials) is sensible - if someone has a bearer token, they probably want to use it. The docstring clearly documents the environment variables and the resolution behavior.
One small note: if neither bearer token nor client credentials are found in the environment,
resolve_cloud_client_id()andresolve_cloud_client_secret()will raisePyAirbyteSecretNotFoundError, which is correctly documented in the raises section.
1-38: Excellent module documentation!The docstring provides clear examples for all three usage patterns (client credentials, bearer token, and environment variables). This will be very helpful for users adopting the new authentication flow.
airbyte/cloud/workspaces.py (2)
190-207: LGTM! The bearer-token-first resolution is clean.The priority order (bearer token → client credentials) is well-documented and implemented correctly. The early return pattern keeps the code readable.
221-227: Consistent bearer_token propagation across all API calls.All the API utility calls throughout this file now consistently pass
bearer_token=self.bearer_token. The implementation looks complete and uniform.airbyte/mcp/_util.py (6)
38-42: Header constants look good!Nice job defining these as module-level constants. This makes it easy to keep the header names consistent across the codebase and avoids typos.
247-261: LGTM! Case-insensitive header lookup is correct per HTTP spec.HTTP headers are case-insensitive per RFC 7230, so this implementation is correct. The linear scan is fine for typical header dictionary sizes.
264-287: LGTM! Bearer token extraction looks solid.Good handling of the
Authorization: Bearer <token>format with case-insensitive prefix matching and empty token check. TheSecretStringwrapping ensures the token is properly protected from accidental logging.
290-319: Consistent pattern for client credential header extraction.Both functions follow the same pattern and properly wrap sensitive values in
SecretString. The consistency with the bearer token extraction function makes the code predictable and easy to follow.
348-428: Clean multi-source credential resolution!The resolution logic is well-documented and follows a sensible priority order. One thing to note: if a bearer token is found (from any source), the function returns early without attempting to resolve client credentials. This is fine, but it means that if someone has both
AIRBYTE_CLOUD_BEARER_TOKENenv var ANDX-Airbyte-Cloud-Client-Idheader set simultaneously, the bearer token will silently win.This seems like reasonable behavior (bearer token is simpler/preferred), but you might want to add a brief note in the docstring about this "first match wins" behavior across authentication methods, wdyt?
431-459: LGTM! Workspace ID resolution follows the same clean pattern.Consistent with the credential resolution approach. The docstring correctly documents that
PyAirbyteSecretNotFoundErrorwill be raised if no workspace ID can be resolved.airbyte/_util/api_util.py (4)
81-140: Solid authentication logic in the core server instance factory!The mutual exclusivity validation is clear and the error messages provide good guidance. The early validation ensures invalid auth combinations fail fast. Using
models.Security(bearer_auth=...)vsmodels.Security(client_credentials=...)is a clean separation.One small observation: the function now requires at least one valid auth method (line 115-119), which is correct. Previously it might have allowed no auth? This is a good hardening.
181-231: Consistent bearer_token propagation in list functions.The
bearer_tokenparameter is added consistently with the sameSecretString | None = Nonetype hint and default value across all list functions. This maintains backwards compatibility while enabling the new auth method.
467-520: LGTM! Thenoqa: PLR0913is appropriate here.Nine parameters is indeed a lot, but given the auth flexibility requirements (supporting both auth methods) and the query parameters needed, this seems reasonable. The function signature is clear and follows the established pattern.
1867-1903: Good coverage on Config API functions.
get_workspace_organization_info,get_connection_state, andget_connection_catalogall correctly support the optionalbearer_tokenparameter. The pattern is consistent throughout.
- Keep api_root in original position (4th) to preserve positional arg order - Add field(init=False) to _credentials to exclude from __init__ - bearer_token is now keyword-only (5th position with default) Co-Authored-By: AJ Steers <[email protected]>
Per PR feedback, removed = None defaults from api_util function signatures for client_id, client_secret, and bearer_token parameters. This ensures all callers explicitly pass these parameters and omitting bearer_token support is caught as a type checking error. Updated all callers in: - api_util.py (list_organizations_for_user, list_workspaces_in_organization) - connections.py (CloudConnection methods) - connectors.py (CloudSource, CloudDestination, CustomCloudSourceDefinition) - sync_results.py (SyncResult methods) - cloud_ops.py (MCP tool functions) Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte/_util/api_util.py (1)
35-44: DuplicateTYPE_CHECKINGblocks detected.Lines 35-36 and 39-44 both have
if TYPE_CHECKING:blocks. Could these be merged into a single block? It looks like a merge artifact, wdyt?if TYPE_CHECKING: from collections.abc import Callable - - -if TYPE_CHECKING: - from collections.abc import Callable from airbyte_api.models import ( DestinationConfiguration, )
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte/_util/api_util.py(76 hunks)airbyte/cloud/connections.py(8 hunks)airbyte/cloud/connectors.py(13 hunks)airbyte/cloud/sync_results.py(5 hunks)airbyte/mcp/cloud_ops.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/cloud/connectors.py
- airbyte/mcp/cloud_ops.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (49)
airbyte/cloud/connections.py (8)
57-66: Looks good! Bearer token properly propagated to_fetch_connection_info.The pattern
bearer_token=self.workspace.bearer_tokenis consistent with how other credentials are sourced from the workspace object, wdyt?
171-185: LGTM onrun_syncbearer token propagation.Consistent pattern maintained for the
run_connectionAPI call.
238-248: Bearer token propagation inget_previous_sync_logslooks solid.The
get_job_logscall now includes all three auth parameters. Nice consistency!
299-305:get_state_artifactsproperly updated.
321-327:get_catalog_artifactcorrectly threads bearer token.
339-348:renamemethod updated correctly.
359-368:set_table_prefixfollows the same pattern.
384-393:set_selected_streamscompletes the bearer token propagation.All API utility calls in this class now consistently pass
bearer_token=self.workspace.bearer_token.airbyte/cloud/sync_results.py (5)
247-260:_get_connection_infoproperly updated with bearer token.Clean propagation to the
api_util.get_connectioncall.
262-272:_get_destination_configurationincludes bearer token.The call to
api_util.get_destinationnow has the auth parameter chain complete.
282-294:_fetch_latest_job_infocorrectly threads bearer token.
306-324: Nice handling instart_timefallback path!The retry logic using
_make_config_api_requestproperly includesbearer_tokenfor the Config API call. This ensures authentication works even in the exception recovery path, wdyt?
326-341:_fetch_job_with_attemptsproperly updated.The lazy-loaded job info with attempts now correctly passes bearer_token to the Config API helper.
airbyte/_util/api_util.py (36)
81-140: Solid authentication logic inget_airbyte_server_instance!The mutual exclusivity validation is clean and the error messages provide good guidance. A few things I appreciate:
- Clear validation that you can't use both auth methods simultaneously (lines 109-113)
- Clear validation when no auth is provided (lines 115-119)
- Bearer auth path returns early (lines 121-127), keeping the client credentials as the default flow
This is a well-structured gatekeeper for authentication, nice work!
146-160:get_workspacesignature and propagation looks good.The explicit
SecretString | Nonewithout defaults ensures callers must be explicit about auth params.
181-203:list_connectionsupdated consistently.
234-256:list_workspacesfollows the pattern.
285-307:list_sourcescorrectly updated.
335-357:list_destinationsfollows suit.
392-408:get_connectionsignature updated correctly.
424-444:run_connectionproperly threads bearer token.
467-500:get_job_logssignature and implementation look good.The
noqa: PLR0913comment acknowledges the many arguments - this is a reasonable trade-off for explicit auth parameter passing.
523-537:get_job_infoupdated correctly.
556-576:create_sourcefollows the pattern.
595-609:get_sourceproperly updated.
625-686:delete_sourcecorrectly propagates bearer token in both the safety check and the actual deletion.Good that the nested
get_sourcecall (line 663) also gets the bearer_token for the safe_mode lookup.
702-733:patch_sourceupdated consistently.
781-797:create_destinationfollows the pattern.
820-834:get_destinationproperly updated.
868-930:delete_destinationcorrectly handles bearer token in nested call too.Same pattern as
delete_source- the safety check at line 906 properly passes bearer_token.
946-977:patch_destinationupdated consistently.
1023-1042:create_connectionfollows the pattern.
1065-1081:get_connection_by_nameproperly threads bearer token to nested call.
1117-1179:delete_connectioncorrectly handles bearer token in safety check.Line 1154 properly passes bearer_token to the nested
get_connectioncall.
1195-1232:patch_connectionupdated consistently.
1289-1344:_make_config_api_requesthandles both auth paths nicely.The logic at lines 1301-1311 properly falls back to generating a bearer token from client credentials when no bearer_token is provided. This maintains backward compatibility while supporting the new auth method. Clean implementation!
1347-1375:check_connectorproperly updated.
1432-1448:create_custom_yaml_source_definitionfollows the pattern.
1469-1483:list_custom_yaml_source_definitionsupdated correctly.
1505-1520:get_custom_yaml_source_definitionfollows suit.
1544-1560:update_custom_yaml_source_definitionupdated consistently.
1588-1649:delete_custom_yaml_source_definitionproperly handles nested call.Line 1621 correctly passes bearer_token to the safety check.
1668-1705:get_connector_builder_project_for_definition_idupdated correctly.
1709-1754:update_connector_builder_project_testing_valuesfollows the pattern.
1760-1785:list_organizations_for_userproperly updated.
1800-1849:list_workspaces_in_organizationcorrectly threads bearer token.
1873-1909:get_workspace_organization_infoupdated consistently.
1912-1941:get_connection_statefollows the pattern.
1944-1976:get_connection_catalogproperly updated.Great job on the comprehensive bearer token propagation across all API utility functions! The explicit parameter approach (no defaults) ensures type-safe propagation throughout the codebase.
Per PR feedback, replaced intermediate boolean variables (has_bearer_token, has_client_credentials) with direct null checks on the input variables. This makes the type narrowing linter-informative, allowing the type checker to understand that values are non-None after the guards. Also removed unnecessary str() conversions since the type checker now properly narrows the types after the null checks. Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte/_util/api_util.py (1)
130-134: Redundant validation check after earlier guards, wdyt?Lines 107-111 already ensure that if
bearer_tokenis None, then bothclient_idandclient_secretmust be non-None. Combined with the mutual exclusivity check at lines 114-118, by the time we reach line 129, we're guaranteed thatbearer_token is Noneand both credentials are non-None.The additional check here can never trigger. Would you like to remove it for clarity, or keep it as defensive programming for future maintainability?
Apply this diff if you'd like to remove the redundant check:
- # Option 2: Client credentials flow (guaranteed non-None by first guard) - if client_id is None or client_secret is None: - raise PyAirbyteInputError( - message="Client credentials incomplete.", - guidance="Provide both client_id and client_secret.", - ) - + # Option 2: Client credentials flow (guaranteed non-None by guards above) return airbyte_api.AirbyteAPI(
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_util/api_util.py(76 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/_util/api_util.py (3)
1295-1350: LGTM! Clean authentication handling and error context.The function properly handles both authentication paths (provided bearer token vs. generated from client credentials) and includes helpful error context including the forbidden URL for 403 errors. The validation logic at lines 1308-1312 ensures credentials are provided when bearer_token is not supplied.
84-86: Excellent parameter consistency across all authentication points!All ~35 functions now accept
bearer_token: SecretString | None(without defaults, as requested in past reviews) and consistently thread it through toget_airbyte_server_instanceor_make_config_api_request. The signature uniformity makes the authentication model easy to understand and maintain.Also applies to: 156-158, 191-193, 244-246, 295-297, 345-347, 403-405, 435-437, 479-481, 533-535, 569-571, 605-607, 636-638, 712-714, 793-795, 830-832, 879-881, 956-958, 1035-1037, 1076-1078, 1129-1131, 1205-1207, 1300-1302, 1357-1359, 1444-1446, 1479-1481, 1516-1518, 1556-1558, 1599-1601, 1679-1681, 1722-1724, 1769-1771, 1810-1812, 1883-1885, 1922-1924, 1954-1956
88-105: Clear and comprehensive authentication documentation!The docstring effectively communicates both authentication methods, their mutual exclusivity, the parameter requirements, and the error conditions. This will help developers understand the authentication model at a glance.
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte/_util/api_util.py (1)
1290-1345: Consider caching bearer tokens for efficiency?The current implementation generates a new bearer token from client credentials on each call to
_make_config_api_requestwhenbearer_tokenis None. This adds an extra API round-trip to/applications/tokenper request.For performance optimization, you might consider caching the generated token with TTL tracking. However, the current approach is simpler and avoids token expiry management, so it may be the right trade-off for now. WDYT - is the extra latency acceptable for Config API operations, or would caching provide meaningful benefits?
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_util/api_util.py(76 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/_util/api_util.py (2)
81-141: LGTM - Solid mutual exclusivity implementation!The authentication logic looks excellent. The two guards properly enforce that exactly one authentication method is provided, and the type narrowing works correctly so the type checker understands the values are non-None in each branch. The docstring clearly documents both auth methods.
One minor observation: after the guards, the type checker should infer that
client_idandclient_secretare non-None in the client credentials path (line 131+), so no explicit assertions are needed. Nice work!
468-521: Appropriate noqa usage for authentication flexibility.The
noqa: PLR0913comment is justified here - the 9 parameters provide necessary authentication flexibility (3 auth parameters) plus filtering/pagination options. The function signature remains clear with keyword-only parameters, and this is consistent with the pattern across the module.
Per PR feedback, renamed CloudCredentials to CloudClientConfig since the class includes both credentials and API root configuration. Changes: - Renamed credentials.py to client_config.py - Renamed CloudCredentials class to CloudClientConfig - Updated all imports and references across the codebase Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/cloud/client_config.py (1)
89-129: Solid validation logic with helpful error messages.The mutual exclusivity enforcement and completeness checks are well-implemented. The error messages with guidance are user-friendly.
One minor observation: the SecretString wrapping on lines 92-97 will re-wrap values that are already
SecretStringinstances. SinceSecretStringinherits fromstr, callingSecretString(already_secret_string)should be fine, but could you consider adding anisinstancecheck to skip re-wrapping for a tiny efficiency gain, wdyt?if self.client_id is not None and not isinstance(self.client_id, SecretString): self.client_id = SecretString(self.client_id)airbyte/mcp/_util.py (1)
290-345: Consider consolidating repeatedget_http_headers()calls.Each of these functions (
get_client_id_from_headers,get_client_secret_from_headers,get_workspace_id_from_headers,get_api_url_from_headers) independently callsget_http_headers(). Inresolve_cloud_credentialsandresolve_workspace_id, multiple header extraction functions might be called, potentially resulting in multipleget_http_headers()invocations.Would it make sense to have a single function that extracts all relevant values from headers at once, or perhaps cache the headers for the duration of a request? This would reduce overhead if
get_http_headers()has any cost. That said, if it's lightweight (just accessing a context variable), this might be fine as-is, wdyt?
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte/cloud/__init__.py(2 hunks)airbyte/cloud/client_config.py(1 hunks)airbyte/cloud/workspaces.py(19 hunks)airbyte/mcp/_util.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte/cloud/client_config.py (3)
airbyte/cloud/auth.py (4)
resolve_cloud_api_url(50-58)resolve_cloud_bearer_token(9-31)resolve_cloud_client_id(42-47)resolve_cloud_client_secret(34-39)airbyte/exceptions.py (1)
PyAirbyteInputError(201-210)airbyte/secrets/base.py (1)
SecretString(38-143)
airbyte/cloud/__init__.py (1)
airbyte/cloud/client_config.py (1)
CloudClientConfig(57-189)
airbyte/cloud/workspaces.py (2)
airbyte/cloud/auth.py (5)
resolve_cloud_bearer_token(9-31)resolve_cloud_client_id(42-47)resolve_cloud_client_secret(34-39)resolve_cloud_workspace_id(61-66)resolve_cloud_api_url(50-58)airbyte/cloud/client_config.py (1)
CloudClientConfig(57-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (35)
airbyte/cloud/client_config.py (4)
1-54: Clear and well-documented module with helpful usage examples.The docstring provides excellent documentation with practical examples for both authentication methods. The import structure is clean and appropriate.
56-88: Well-structured dataclass with clear field documentation.The class documentation and field-level docstrings clearly explain the purpose and mutual exclusivity of authentication methods. Nice work!
131-139: LGTM!The helper properties are straightforward and correctly implement the checks.
141-189: Well-designed factory method with clear precedence rules.The bearer-token-first resolution strategy is clearly documented and correctly implemented. The docstring thoroughly explains the environment variables and behavior.
airbyte/cloud/__init__.py (4)
56-56: LGTM!The import is correctly placed with the other cloud module imports.
66-66: LGTM!Proper TYPE_CHECKING import for documentation/type-checker support.
74-74: LGTM!Correctly added to submodules section of
__all__.
79-79: LGTM!Correctly added to classes section of
__all__.airbyte/cloud/workspaces.py (19)
38-38: LGTM!The
fieldimport is needed for the internal_credentialsfield configuration.
50-55: LGTM!Clean addition of the new imports needed for bearer token support and credential validation.
93-123: Clear documentation and well-structured field additions.The docstring updates provide helpful examples for both auth methods. The
_credentialsfield is properly configured withinit=False, repr=Falseto exclude it from the constructor and string representation.One thing I noticed: with
bearer_tokenplaced afterapi_root(line 120), the positional argument order for backward compatibility would beworkspace_id, client_id, client_secret, api_root. Users who previously used positional args won't be affected. That said, the order might be slightly surprising for new users who might expect auth-related fields grouped together. Would you consider reordering toworkspace_id, client_id, client_secret, bearer_token, api_root, or is maintaining the currentapi_rootposition intentional for backward compat, wdyt?
125-141: Good use of CloudClientConfig for validation.Delegating the mutual exclusivity validation to
CloudClientConfigin__post_init__is a nice DRY approach. The SecretString wrapping before passing to CloudClientConfig ensures consistency.
156-207: Well-documented factory method with bearer-first priority.The implementation correctly mirrors
CloudClientConfig.from_envwith the addition of workspace ID resolution. The docstring clearly explains the environment variables and precedence.
214-227: LGTM!The
bearer_tokenis correctly passed to the API call.
297-311: LGTM!Bearer token correctly propagated to
get_workspace.
392-404: LGTM!Bearer token correctly passed to
create_source.
448-460: LGTM!Bearer token correctly passed to
create_destination.
483-491: LGTM!Bearer token correctly passed to
delete_source.
516-528: LGTM!Bearer token correctly passed to
delete_destination.
563-581: LGTM!Bearer token correctly passed to
create_connection.
611-620: LGTM!Bearer token correctly passed to
delete_connection.
645-661: LGTM!Bearer token correctly passed to
list_connections.
673-689: LGTM!Bearer token correctly passed to
list_sources.
701-717: LGTM!Bearer token correctly passed to
list_destinations.
802-824: LGTM!Bearer token correctly passed to
create_custom_yaml_source_definition.
839-855: LGTM!Bearer token correctly passed to
list_custom_yaml_source_definitions.
872-886: LGTM!Bearer token correctly passed to
get_custom_yaml_source_definition.airbyte/mcp/_util.py (8)
4-5: LGTM!Standard future annotations import for modern type hints.
16-31: LGTM!Clean organization of imports for cloud auth functions and types.
38-42: LGTM!Clear header constant definitions following the
X-Airbyte-Cloud-*convention documented in the PR.
247-261: Nice case-insensitive header lookup helper.This is a solid implementation. HTTP headers are case-insensitive per RFC 7230, so this is the correct approach.
264-287: Well-implemented bearer token extraction.The parsing of the
Authorization: Bearer <token>header is correctly implemented with case-insensitive prefix matching and proper whitespace handling.
431-459: LGTM!Clean workspace ID resolution with the same priority order. The docstring correctly notes it raises
PyAirbyteSecretNotFoundErrorif not found.
13-13: No action needed—get_http_headersis available and safe to use.
get_http_headers()was added in fastmcp 2.2.11 and is designed to safely return an empty dict if no request context exists. The project usesfastmcp >=2.11.3, well above this version. The code defensively checks for empty headers (lines 274, 297, 312, 329, 341) before using them, which is the recommended pattern for handling both HTTP and stdio transports. Wdyt—should we close this as verified?
348-428: Credential resolution design inconsistency: bearer token vs client credentials error handling.The resolution logic correctly implements the documented priority order. One thing to consider though:
resolve_cloud_bearer_token()usestry_get_secret()which gracefully returns None when the env var isn't set, butresolve_cloud_client_id()andresolve_cloud_client_secret()useget_secret()which raisesPyAirbyteSecretNotFoundError. This means if headers are empty and env vars aren't set, users get aPyAirbyteSecretNotFoundErrorfrom client credential resolution before CloudClientConfig can provide its more helpful "No authentication credentials provided" error message. Would usingtry_get_secret()for client credentials as well (matching the bearer token pattern) better consolidate error handling?
Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/integration_tests/cloud/test_cloud_api_util.py (2)
30-61: LGTM: Backward-compatible parameter additions.The bearer_token=None additions to workspace operations maintain backward compatibility while enabling the new authentication path. The existing client credential flow is still properly tested.
However, I notice these integration tests don't exercise the bearer_token authentication path introduced in this PR. Since you're adding bearer token support as a first-class authentication option, would it make sense to add at least one test that verifies bearer_token authentication works end-to-end? Even a simple test using a bearer token (perhaps from a fixture) would help ensure the feature works as intended. Wdyt?
Do you want me to draft a test function that exercises bearer_token authentication, or would you prefer to track this as a follow-up issue?
264-304: Consider extending test_get_bearer_token to verify token usage.The bearer_token=None addition to test_check_connector is consistent with the other changes.
I noticed that test_get_bearer_token successfully generates a bearer token but doesn't verify that the token can actually be used for API authentication. Since this PR introduces bearer token as a primary authentication method, would it be valuable to extend this test to make a follow-up API call using the generated token (e.g., call get_workspace with bearer_token=token and client_id=None, client_secret=None)? This would provide end-to-end verification of the token generation → authentication flow. Wdyt?
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/integration_tests/cloud/test_cloud_api_util.py(13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration_tests/cloud/test_cloud_api_util.py (1)
airbyte/_util/api_util.py (2)
delete_source(626-700)delete_destination(869-944)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
tests/integration_tests/cloud/test_cloud_api_util.py (1)
64-255: Consistent parameter threading through CRUD operations.All CRUD operations (create, list, delete for sources, destinations, and connections) have been consistently updated with bearer_token=None. The test logic correctly follows the safe-mode naming convention with "deleteme-" prefixes, and the lifecycle tests properly clean up resources.
|
/prerelease
|
The list_cloud_workspaces and describe_cloud_organization MCP tools were missing bearer token authentication support introduced in PR #916. These tools were directly using resolve_cloud_client_id() and resolve_cloud_client_secret() instead of resolve_cloud_credentials(), and hardcoding bearer_token=None. This fix: - Updates _resolve_organization helper to accept bearer_token parameter - Updates _resolve_organization_id helper to accept bearer_token parameter - Updates list_cloud_workspaces to use resolve_cloud_credentials() - Updates describe_cloud_organization to use resolve_cloud_credentials() - Removes unused imports (resolve_cloud_api_url, resolve_cloud_client_id, resolve_cloud_client_secret) Co-Authored-By: [email protected] <[email protected]>
feat(cloud): Add CloudCredentials class and bearer token authentication support
Summary
This PR consolidates authentication handling for the Airbyte Cloud API by introducing a new
CloudCredentialsclass and adding bearer token authentication support throughout the codebase. This replaces PRs #866, #867, and #914 with a unified implementation.Key changes:
CloudCredentialsdataclass inairbyte/cloud/credentials.pythat encapsulates auth configuration with mutual exclusivity enforcement (either client_id/secret OR bearer_token, not both)bearer_tokenparameter to all API utility functions inapi_util.py(~30 functions)airbyte/mcp/_util.pyfor MCP HTTP/SSE transportresolve_cloud_credentials()function with multi-source resolution: explicit params → HTTP headers → environment variablesCloudWorkspaceto support bearer token authenticationAIRBYTE_CLOUD_BEARER_TOKENenvironment variable constantHeader conventions:
Authorization: Bearer <token>headerX-Airbyte-Cloud-Client-IdandX-Airbyte-Cloud-Client-SecretheadersX-Airbyte-Cloud-Workspace-IdheaderX-Airbyte-Cloud-Api-UrlheaderReview & Testing Checklist for Human
This is a medium-risk PR due to the scope of changes across many files. Please verify:
CloudCredentialscorrectly rejects when both bearer_token AND client_id/secret are provided, and when neither is providedbearer_tokenis properly passed through the entire call chain (e.g.,CloudWorkspace→api_utilfunctions)Recommended test plan:
Notes
get_job_logsfunction now has 9 arguments (exceeds 8 limit), suppressed withnoqa: PLR0913- this is necessary for auth flexibilityCloudCredentialsor header extraction functions - consider adding these@pytest.mark.requires_creds)Link to Devin run: https://app.devin.ai/sessions/a53f2e3ff94c48e5a49d4be03f15ec01
Requested by: @aaronsteers
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.