[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441

anguy116 · 2024-11-27T15:59:34Z

What is your question?

I'm currently trying to run the rapids qualification command against an existing databricks-aws cluster where the spark event logs are being outputted to s3://bucket/cluster_logs.

My question being why with all of my prerequisite checks out of the way does the rapids tool not have the correct privilege to read my s3 bucket that contains all of my cluster logs.

The commands that I'm running:

export AWS_PROFILE=role_with_suffice_permissions
export DATABRICKS_CONFIG_FILE=~/.databrickscfg

spark_rapids qualification \
--platform databricks-aws \
--eventLogs s3://bucket/cluster_logs/<cluster_id>/eventlog/<cluster_id>_<cluster_ip>/<some_id>/

another command I'm also attempting to run:

export AWS_PROFILE=role_with_suffice_permissions
export DATABRICKS_CONFIG_FILE=~/.databrickscfg

spark_rapids qualification \
--platform databricks-aws \
--eventLogs s3://bucket/cluster_logs \
--cpu-cluster <cluster_id>

My databricks cluster:
cluster size: m7gd.2xlarge
mode: single access
dbx runtime: 16.0
instance_profile: (has putAclObject s3 policy for resource s3://bucket/cluster_logs/)

The issue I'm running into for the first command:

2024-11-27 09:04:06 main WARN EventLogPathProcessor:93 - Unexpected exception occurred reading s3a://bucket/cluster_logs/1125-182533-elyhzrsq/eventlog/1125-182533-elyhzrsq_15_1_31_253/6465501575869150700/, skipping!
java.nio.file.AccessDeniedException: s3a://bucket/cluster_logs/1125-182533-elyhzrsq/eventlog/1125-182533-elyhzrsq_15_1_31_253/6465501575869150700: getFileStatus on s3a://bucket/cluster_logs/1125-182533-elyhzrsq/eventlog/1125-182533-elyhzrsq_15_1_31_253/6465501575869150700: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 1AKHSJ6NKD6E01FT; S3 Extended Request ID: fwkXq5X/AU8lQx54bAFz41bhoCqD3Ya6e/gxW8EgNoAYIadaLKJRUaCfafFf33bT2FOOY6u1IBY=; Proxy: null), S3 Extended Request ID: fwkXq5X/AU8lQx54bAFz41bhoCqD3Ya6e/gxW8EgNoAYIadaLKJRUaCfafFf33bT2FOOY6u1IBY=:403 Forbidden

Similarly when I run the second command:

2024-11-27 08:47:04 main WARN EventLogPathProcessor:93 - Unexpected exception occurred reading s3a://bucket/cluster_logs/, skipping!
java.nio.file.AccessDeniedException: s3a://bucket/cluster_logs: getFileStatus on s3a://bucketcluster_logs: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: MC1XR3WYNZG64KZD; S3 Extended Request ID: FHNkNKGHdXGrCDGVjD/p+5tIQ9OUWTX/PMgO7SEPpgqtZRs2LJkJklPoZhrA0IQVyv4rm/60KOs=; Proxy: null), S3 Extended Request ID: FHNkNKGHdXGrCDGVjD/p+5tIQ9OUWTX/PMgO7SEPpgqtZRs2LJkJklPoZhrA0IQVyv4rm/60KOs=:403 Forbidden

Sanity Checks

The role_with_suffice_permissions has the right permissions as I can freely pull, push and list the s3://bucket/cluster_logs/ bucket.
The databricks cli is setup correctly as I can list all clusters in my workspace.

I can see that the rapids tool is obtaining the correct role name when I set the commands to --verbose

2024-11-27 09:55:32,271 DEBUG rapids.tools.qualification: Processing Rapids plugin Arguments {'aws_profile': 'role_with_suffice_permissions'}
2024-11-27 09:55:32,271 DEBUG rapids.tools.qualification: Processing tool CLI argument.. aws_profile:['role_with_suffice_permissions']

The text was updated successfully, but these errors were encountered:

parthosa · 2024-11-27T21:34:36Z

To better debug this issue, could you clarify the following:

Where is the tool running?
- Is it running inside a Databricks node or on a local machine?
If running inside a Databricks node:
- The AWS_PROFILE environment variable might not be used. Instead, the instance profile attached to the Databricks cluster would be used for accessing S3.
- Does the IAM role attached to the Databricks cluster have only the putObjectAcl permission? This might not be sufficient to list and read objects in the S3 bucket.
- Databricks recommends adding the following permissions for access to S3:
  - s3:GetObject
  - s3:ListBucket
- For more details, please refer to the documentation on instance profiles.

anguy116 · 2024-11-27T21:36:22Z

To better debug this issue, could you clarify the following:

Where is the tool running?

Is it running inside a Databricks node or on a local machine?

If running inside a Databricks node:

The AWS_PROFILE environment variable might not be used. Instead, the instance profile attached to the Databricks cluster would be automatically used for accessing S3.

Does the IAM role attached to the Databricks cluster have only the putObjectAcl permission? This might not be sufficient to list and read objects in the S3 bucket.

Databricks recommends adding the following permissions for access to S3:

s3:GetObject

s3:ListBucket

For more details, please refer to the documentation on instance profiles.

This is running locally on my machine

parthosa · 2024-11-27T22:21:17Z

Thanks for clarifying that. I am able to reproduce the bug by adjusting the policy.

Could you check the policy associated with user whose credentials are saved under the AWS_PROFILE role_with_suffice_permissions has list bucket policy?

anguy116 · 2024-11-27T22:28:41Z

Yes looking into the IAM Role, the role has a policy attached that allows for most if not all actions on all s3 resources (ex: allow s3.* on resource *). It only specifies actions it CANNOT perform.
The list bucket policy not being one of them

parthosa · 2024-11-27T22:39:31Z

That's good.

IAM role needs to be attached to be an IAM user for providing the credentials. I suspect there could be a override happening.
Could check if there are any other environment variables set? (e.g. (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))

anguy116 · 2024-11-28T19:04:29Z

Steps I've taken in consideration to this.

No other ENV variables are set besides AWS_PROFILE
I removed the default IAM section from my .aws/credentials incase its trying to use that to access my AWS resources

verbose logs for clarity

2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property profile is not set. Setting default value DEFAULT from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsProfile is not set. Setting default value role_with_suffice_permissions from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property cliConfigFile is not set. Setting default value /Users/user/.databrickscfg from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsCliConfigFile is not set. Setting default value /Users/user/.aws/config from environment variable
2024-11-28 14:01:49,245 WARNING rapids.tools.csp: Property awsCredentialFile is not set. Setting default value /Users/user/.aws/credentials from environment variable

NOTE: DEFAULT value for databricks profile is what I use to connect and interact with my DBX environment

parthosa · 2024-12-11T00:51:57Z

As an alternative, we provide a Jupyter notebook for running the Qualification tool in a Databricks environment - Download

Prerequisite:

A running compute cluster on Databricks (a single node is sufficient or reuse any existing cluster).
An instance profile linked to Databricks Docs
- Since the outputs are being written to s3, I suspect this is already setup.

Notebook Usage:

Import the notebook into Databricks via File -> Import Notebook.
Open the notebook and attach it to the compute cluster mentioned above.
Enter the event log s3 path in the text widget at the top of the notebook.
Click Run all to run qualification tool on the provided logs.

anguy116 added ? - Needs Triage question Further information is requested labels Nov 27, 2024

amahussein added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Dec 3, 2024

mattahrens assigned parthosa Dec 4, 2024

mattahrens removed the ? - Needs Triage label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441

[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441

anguy116 commented Nov 27, 2024

parthosa commented Nov 27, 2024 •

edited

Loading

anguy116 commented Nov 27, 2024

parthosa commented Nov 27, 2024

anguy116 commented Nov 27, 2024

parthosa commented Nov 27, 2024

anguy116 commented Nov 28, 2024

parthosa commented Dec 11, 2024 •

edited

Loading

[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441

[QST] RAPIDS Qualification Tool Fails to Obtain S3 action getFileStatus for Databricks-AWS Platform #1441

Comments

anguy116 commented Nov 27, 2024

Sanity Checks

parthosa commented Nov 27, 2024 • edited Loading

anguy116 commented Nov 27, 2024

parthosa commented Nov 27, 2024

anguy116 commented Nov 27, 2024

parthosa commented Nov 27, 2024

anguy116 commented Nov 28, 2024

parthosa commented Dec 11, 2024 • edited Loading

Prerequisite:

Notebook Usage:

parthosa commented Nov 27, 2024 •

edited

Loading

parthosa commented Dec 11, 2024 •

edited

Loading