-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441
Comments
To better debug this issue, could you clarify the following:
|
This is running locally on my machine |
Thanks for clarifying that. I am able to reproduce the bug by adjusting the policy. Could you check the policy associated with user whose credentials are saved under the AWS_PROFILE |
Yes looking into the IAM Role, the role has a policy attached that allows for most if not all actions on all s3 resources (ex: allow s3.* on resource *). It only specifies actions it CANNOT perform. |
That's good.
|
Steps I've taken in consideration to this.
verbose logs for clarity 2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property profile is not set. Setting default value DEFAULT from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsProfile is not set. Setting default value role_with_suffice_permissions from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property cliConfigFile is not set. Setting default value /Users/user/.databrickscfg from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsCliConfigFile is not set. Setting default value /Users/user/.aws/config from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsCredentialFile is not set. Setting default value /Users/user/.aws/credentials from environment variable NOTE: DEFAULT value for databricks profile is what I use to connect and interact with my DBX environment |
As an alternative, we provide a Jupyter notebook for running the Qualification tool in a Databricks environment - Download Prerequisite:
Notebook Usage:
|
What is your question?
I'm currently trying to run the rapids qualification command against an existing databricks-aws cluster where the spark event logs are being outputted to
s3://bucket/cluster_logs
.My question being why with all of my prerequisite checks out of the way does the rapids tool not have the correct privilege to read my s3 bucket that contains all of my cluster logs.
The commands that I'm running:
another command I'm also attempting to run:
My databricks cluster:
cluster size: m7gd.2xlarge
mode: single access
dbx runtime: 16.0
instance_profile: (has putAclObject s3 policy for resource
s3://bucket/cluster_logs/
)The issue I'm running into for the first command:
Similarly when I run the second command:
Sanity Checks
The
role_with_suffice_permissions
has the right permissions as I can freely pull, push and list thes3://bucket/cluster_logs/
bucket.The databricks cli is setup correctly as I can list all clusters in my workspace.
I can see that the rapids tool is obtaining the correct role name when I set the commands to
--verbose
The text was updated successfully, but these errors were encountered: