Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Implement posit workbench credentials strategy and make credentials strategy fallback options more explicit #384

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dbkegley
Copy link
Collaborator

Closes #382

Adding this draft PR for initial review before we commit the changes to these APIs.

These changes make the separation between the different credential strategies more explicit and tries to be better about detecting the difference between local and workbench content.

It also combines the Content and Viewer strategy into a single new class: PositConnectCredentialStrategy. If user-session-token is provided then we use the viewer implementation. If not then we should fall back to the content credentials implementation. I'm not sure if this is the right choice so looking for feedback on this.

I also modified some of the naming in the hopes of the choice of which strategy to use more obvious to end users of these helpers.

Copy link

github-actions bot commented Feb 14, 2025

Hey there! 👋

We noticed that the title of your pull request doesn't follow the Conventional Commits specification. To ensure consistency, we kindly ask you to adjust the title accordingly.

Here are the details:

No release type found in pull request title "[feat] Implement posit workbench credentials strategy and make credentials strategy fallback options more explicit". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - build
 - chore
 - ci
 - docs
 - feat
 - fix
 - perf
 - style
 - refactor
 - test

@dbkegley dbkegley requested a review from kmasiello February 14, 2025 21:10
@dbkegley dbkegley changed the title Implement posit workbench credentials strategy and make credentials strategy fallback options more explicit [feat] Implement posit workbench credentials strategy and make credentials strategy fallback options more explicit Feb 14, 2025


class CredentialsStrategy(abc.ABC):
class CredentialsStrategyWrapper(CredentialsStrategy):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revisit this. I'd rather not depend on the databricks sdk but this violates the TYPE_CHECKING import above.

return self._connect_strategy(*args, **kwargs)
else:
print()
# log and continue
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like move these log messages into the init so that they aren't printed everytime the user requests credentials

Comment on lines 233 to 236
return positLocalContentCredentialsProvider(
self._token_endpoint_url,
self._client_id,
self._client_secret,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me again why we need this? Shouldn't we be able to just pass the user session token to client.oauth.get_credentials()?

Copy link
Collaborator Author

@dbkegley dbkegley Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be able to just pass the user session token to client.oauth.get_credentials()?

cient.oauth.get_credentials and client.oauth.get_content_credentials are only available when the content is running on Connect. This helper would allow content running locally (or in Workbench) to obtain the same service principal credential that it would use when running on a Connect server.

Originally we had this because the databricks-cli doesn't support service principal auth however after looking more closely at the databricks-sdk, I think we should be able to use some of the provided service_principal credentials strategies rather than implementing this ourselves.

Comment on lines 260 to 111
"""
def __init__(self,
client: Optional[Client] = None,
user_session_token: Optional[str] = None,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
def __init__(self,
client: Optional[Client] = None,
user_session_token: Optional[str] = None,
):
"""
def __init__(self,
client: Optional[Client] = None,
*,
user_session_token: Optional[str] = None,
content_session_token: Optional[str] = None,
):

Would keep these explicitly keyword arguments and then allows the dev to pass the token with an appropriately named param.

@kmasiello
Copy link

My review is not of the code itself, but from the perspective of the developer/publisher trying to understand the correct usage of this helper. Some of my comments may reflect naivety in Python, but this may be representative of many of our users and we want them to be able to use this helper without significant mental overhead.

Here's a markup of the questions and confusion I had in trying to understand the shiny example. I will summarize each main point (numbered) separately below for discussion. (and no, i don't usually do PR comments with images 😂)
image

@kmasiello
Copy link

(1). "Posit", "Credentials", "Strategy", "posit_strategy", "credentials_strategy", "connect_strategy", "PositCredentials", "PositConnectCredentials", "PositWorkbenchCredentials" ...
This is very distracting and overwhelming, making it difficult to follow the logic path for what we're doing here.

@kmasiello
Copy link

(2). session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token") or session_token = flask.request.headers.get("Posit-Connect-User-Session-Token"). The viewer's session token is always going to be retrieved at this header, so don't make me have to write the code to go get it.

@kmasiello
Copy link

(3). workbench_strategy=PositWorkbenchCredentialsStrategy(Config(profile="workbench")), I don't love this because it's a lot to type out. Why not just workbench_strategy=Config(profile="workbench") or set this by default so I can just do workbench_strategy=PositWorkbenchCredentialsStrategy() but if I happen to have a different profile defined (maybe because I'm using a service account and need M2M) then I can change to a different profile.

@kmasiello
Copy link

(4) connect_strategy=PositConnectCredentialsStrategy(user_session_token=session_token). Again, this is a lot to type out. and it would align better with my mental model of credential methods on connect if I could instead specify connect_strategy=PositConnectCredentialsStrategy(type=[viewer_oath | service_account_oath | envvars])

@kmasiello
Copy link

(5). PositWorkbenchCredentialsStrategy and PositConnectCredentialsStrategy - again, word soup. Maybe we call these credential methods or credential types? I think the helper should have one "Strategy", not a PositStrategy and a Strategy

@kmasiello
Copy link

(6). posit_strategy isn't descriptive of what it actually is. It's a strategy for handling credentials, not for handling Posit.

@kmasiello
Copy link

(7) here's where I completely lost the logic trail.
Part of my confusion was not realizing that credentials_provider is a valid argument to sql.connect. I had only used access_token= before. So among the word soup from posit_sdk and now the databricks sql connector adding a related term, it was hard to follow. I've stared at this for an hour and I still don't understand the flow here.

@@ -25,12 +29,11 @@ def server(i: Inputs, o: Outputs, session: Session):
session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, and this applies to all the examples:
From the developer’s POV, I am in local/workbench first. It feels out of order to have the Connect-specific session token defined so early. My mental model is to get through the parts about defining how to handle creds in development vs deployment, then I’d start putting Connect-specific info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a constraint given the custom nature of this integration. The way it is shown here, this code can be written once and then adapt to the environment it is running it. But that means there is a lowest common denominator in terms of DX as a tradeoff. Ideally, devs are thinking about how their code would work in production as well if that is where they plan to deploy to.

Referencing your other comments though, if we had a way to grab the user session token for the dev if/when it is needed then that could be done behind the scenes at the appropriate time simplifying this immensely.

Copy link
Collaborator

@mconflitti-pbc mconflitti-pbc Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posit_strategy = PositCredentialsStrategy(
        local_strategy=databricks_cli,
        workbench_strategy=PositWorkbenchCredentialsStrategy(Config(profile="workbench")),
        connect_strategy=PositConnectCredentialsStrategy(user_session_token=session_token),
    )

could become:

posit_strategy = PositCredentialsStrategy()

allowing for overrides to defaults, but otherwise it just grabs what it needs under the hood.

@kmasiello
Copy link

🪱 🥫
Databricks SQL Connector is great. But what about authenticating to compute clusters using Spark and databricks-connect ?

from posit.connect.external.databricks import (
PositCredentialsStrategy,
PositConnectCredentialsStrategy,
PositWorkbenchCredentialsStrategy,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a passing observation: it's odd to see PositWorkbenchCredentialsStrategy inside the posit.connect module. They aren't really relevant for Connect, right? Is it time to create posit.workbench to contain these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this before the PR becomes final. For now it's easier to iterate with everything inside a single module.

@mconflitti-pbc
Copy link
Collaborator

(2). session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token") or session_token = flask.request.headers.get("Posit-Connect-User-Session-Token"). The viewer's session token is always going to be retrieved at this header, so don't make me have to write the code to go get it.

Great point! I think the trouble with doing this for people though is that its location is dependent on the app/framework they are using. Would be neat to automatically find it though, I agree.

@dbkegley
Copy link
Collaborator Author

Databricks SQL Connector is great. But what about authenticating to compute clusters using Spark and databricks-connect ?

Any of the various CredentialsStrategy implementations from this PR or from the databricks-sdk should also work with databricks-connect

@dbkegley
Copy link
Collaborator Author

dbkegley commented Feb 18, 2025

(1). "Posit", "Credentials", "Strategy", "posit_strategy", "credentials_strategy", "connect_strategy", "PositCredentials", "PositConnectCredentials", "PositWorkbenchCredentials" ...
This is very distracting and overwhelming, making it difficult to follow the logic path for what we're doing here.

We can do my best to hide some of this from the user but this is an artifact of the databricks SDK, not something we are imposing in our client.

https://github.com/databricks/databricks-sdk-py/blob/998a117c43a7bc901710d263a7a7ab0d66ae8b8c/databricks/sdk/config.py#L103-L121

(2). session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token") or session_token = flask.request.headers.get("Posit-Connect-User-Session-Token"). The viewer's session token is always going to be retrieved at this header, so don't make me have to write the code to go get it.

Right. As Matt said, this is framework-dependent. We could consider adding a helper for each framework but that feels like it will lead to even more confusion like we see with (1)

(3). workbench_strategy=PositWorkbenchCredentialsStrategy(Config(profile="workbench")), I don't love this because it's a lot to type out. Why not just workbench_strategy=Config(profile="workbench") or set this by default so I can just do workbench_strategy=PositWorkbenchCredentialsStrategy() but if I happen to have a different profile defined (maybe because I'm using a service account and need M2M) then I can change to a different profile.

Good call. I think we can do workbench_strategy=PositWorkbenchCredentialsStrategy() pretty easily.

(4) connect_strategy=PositConnectCredentialsStrategy(user_session_token=session_token). Again, this is a lot to type out. and it would align better with my mental model of credential methods on connect if I could instead specify connect_strategy=PositConnectCredentialsStrategy(type=[viewer_oath | service_account_oath | envvars])

We can revisit this but if the choice is viewer_oauth then someone has to pass the token from the header into the PositConnectCredentialsStrategy.

The way this is implemented at the moment, if user_session_token is not provided then we attempt to default to Service Account auth, so the presence of user_session_token is what drives the type of auth used. This way if the publisher changes the oauth integration association in Connect from a Viewer integration to a Service Account integration, then they don't need to update any of their content code. If we want to make this an explicit choice in the code then we can definitely do that instead. I tend to prefer explicit options but this was one area where I tried to make the user do less typing.

(5). PositWorkbenchCredentialsStrategy and PositConnectCredentialsStrategy - again, word soup. Maybe we call these credential methods or credential types? I think the helper should have one "Strategy", not a PositStrategy and a Strategy

If you want to write content that only works on workbench then you don't need the PositStrategy, simply pass a PositWorkbenchStrategy when constructing the Config(credentials_strategy=PositWorkbenchStrategy()). The PositStrategy is useful when you want to author content that works in all 3 environments without making any code changes.

More broadly, the word choices "CredentialsStrategy" and "CredentialsProvider" are constructs from the databricks-sdk, not something we came up with here. We tried to be consistent with their naming in the SDK to avoid confusion but we can call these things whatever we want.

(6). posit_strategy isn't descriptive of what it actually is. It's a strategy for handling credentials, not for handling Posit.

This one is just a var name so an easy fix. How about we just call this creds or credentials? edit: Although I do think it's a little confusing to call it "posit_credentials" - These aren't posit's credentials, they are using posit's strategy when obtaining the user's databricks credentials.

(7) here's where I completely lost the logic trail.
Part of my confusion was not realizing that credentials_provider is a valid argument to sql.connect. I had only used access_token= before. So among the word soup from posit_sdk and now the databricks sql connector adding a related term, it was hard to follow. I've stared at this for an hour and I still don't understand the flow here.

This is a major part of the friction with implementing these helpers. We are trying to make something easy to do for our users when some of these libraries (databricks-sdk and databricks-sql) aren't even compatible inside of Databricks' own ecosystem of tools.

databricks/databricks-sql-python#148 (comment)

@dbkegley
Copy link
Collaborator Author

Oh an regarding the circular reference mentioned in (7). Yes. It's another reason this is so challenging. Config.credentials_strategy is a Callable which is called with Config as an argument.

@dbkegley dbkegley force-pushed the kegs-databricks-workbench branch from d695abf to 3d0e55b Compare February 18, 2025 20:53
@dbkegley
Copy link
Collaborator Author

dbkegley commented Feb 18, 2025

@kmasiello could you take another look at this when you have a minute?

I've done some refactoring based on your feedback. One of the main issues we had before was that there was this circular dependency between a databricks.Config and the credentials_strategy. I've tried to remove this by adding a new_config helper for constructing a databricks config that is all set up with the right strategy. The defaults should now work in most environments but can be overridden explicitly if desired. I still need to test all the different combinations but the basic gist is:

import streamlit as st
from databricks.sdk.core import ApiClient
from databricks.sdk.service.iam import CurrentUserAPI

from posit.connect.external.databricks import (
    new_config,
    ConnectStrategy,
)

session_token = st.context.headers.get("Posit-Connect-User-Session-Token")
cfg = new_config(
    posit_connect_strategy=ConnectStrategy(
        user_session_token=session_token
    ),
)

databricks_user = CurrentUserAPI(ApiClient(cfg)).me()
st.write(f"Hello, {databricks_user.display_name}!")

Unfortunately we can't really get rid of the need to pass in the session_token arg when specifying the connect strategy using viewer auth but hopefully this is an improvement.

If you want to use a Service Account oauth integration when running on Connect then the empty default configuration should be sufficient. This code should work locally, on workbench, and on Connect:

import streamlit as st
from databricks.sdk.core import ApiClient
from databricks.sdk.service.iam import CurrentUserAPI
from posit.connect.external.databricks import new_config

cfg = new_config()

databricks_user = CurrentUserAPI(ApiClient(cfg)).me()
st.write(f"Hello, {databricks_user.display_name}!")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement WorkbenchManagedCredentialsStrategy for Databricks helpers
4 participants