-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): support customize credential provider for write_dataset… #3283
base: main
Are you sure you want to change the base?
feat(python): support customize credential provider for write_dataset… #3283
Conversation
ab24a5b
to
b3219e8
Compare
} | ||
|
||
#[async_trait] | ||
impl CredentialProvider for UrlBasedCredentialProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eddyxu May I ask a question? I realized an UrlBasedCredentialProvider
for our inner object store service to get a temporal credential via an assume role
. It seems it is very similar to the AwsCredentialAdapter
you have provided. Based on AwsCredentialAdapter
, if I only need to implement the ProvideCredentials
trait, that's enough for my purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way you can re-use the AwsCredentialsAdapter
? Or is the purpose of this one somehow different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I need to implement a non-standard AssumeRoleProvider
which receives a URL and fetches ak
, sk
and session_token
from it and support TTL. The built-in AssumeRoleProvider
can not customize a URL. It seems AwsCredentialAdapter provided a TTL-based credential wrapper. Maybe I need to implement ProvideCredentials
trait.
2653d0d
to
596533b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this credential provider is custom for you or your environment? Is it based on some kind of specification or standard?
Can we come up with some kind of solution where this is provided as a plugin? I'd prefer to avoid introducing another specification (UrlCredentialProvider).
I'd also like to avoid the reqwest dependency if possible.
Can we do something like this?
#[derive(Debug)]
pub struct PythonAwsCredentialsProvider {
callback: Arc<PyObject>,
}
impl PythonAwsCredentialsProvider {
fn extract(py: Python<'_>, py_cred: PyObject) -> PyResult<Arc<AwsCredential>> {
let py_cred: &Bound<'_, PyDict> = py_cred.downcast_bound(py)?;
let key_id = py_cred.get_item("key_id")?.ok_or(PyValueError::new_err("key_id is a required property on the dictionary returned by an AWS credential provider"))?;
let secret_key = py_cred.get_item("secret_key")?.ok_or(PyValueError::new_err("secret_key is a required property on the dictionary returned by an AWS credential provider"))?;
let token = py_cred.get_item("token")?;
let key_id: String = key_id.extract()?;
let secret_key: String = secret_key.extract()?;
let token: Option<String> = token.map(|t| t.extract()).transpose()?;
Ok(Arc::new(AwsCredential {
key_id,
secret_key,
token,
}))
}
async fn do_get_credential(&self) -> object_store::Result<Arc<AwsCredential>> {
let callback = self.callback.clone();
tokio::task::spawn_blocking(move || {
Python::with_gil(move |py| {
callback
.call0(py)
.and_then(|py_cred| Self::extract(py, py_cred))
})
})
.await
.unwrap()
.map_err(|e| object_store::Error::Generic {
store: "PythonAwsCredentialsProvider",
source: e.into(),
})
}
}
#[async_trait]
impl CredentialProvider for PythonAwsCredentialsProvider {
type Credential = AwsCredential;
async fn get_credential(&self) -> object_store::Result<Arc<Self::Credential>> {
self.do_get_credential().await
}
}
This way the credentials can be provided by a python function that can do anything (including making an HTTP request to get the fields you need). Then we can make set this as part of the python API:
def write_dataset(
data_obj: ReaderLike,
uri: Union[str, Path, LanceDataset],
schema: Optional[pa.Schema] = None,
mode: str = "create",
*,
max_rows_per_file: int = 1024 * 1024,
...
credentials_provider: Optional[Callable[[], Dict[str, str]]] = None,
...
) -> LanceDataset:
Then it can be used like...
def get_credentials():
# use python requests library or some other mechanism to make
# HTTP request or do whatever is needed
lance.write_dataset(..., credential_provider=get_credentials)
@westonpace Looks good, let me have a try.
No, just a customized requirements. But I want to introduce a pluggable way to implement it. |
Support customize credential provider for write dataset