generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 72
Initial commit for RAG in py-ml #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hmumtazz
wants to merge
45
commits into
opensearch-project:main
Choose a base branch
from
hmumtazz:rag_pipeline
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 42 commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
dbbd4f5
Initial commit for RAG pipeline scripts
hmumtazz dc94feb
Added Licence Header and fixed .gitingore file
hmumtazz cba567c
Added comments to understand code
hmumtazz 3481c3e
Remove sensitive config file
hmumtazz e6eee30
Simplify the selection process for the ef_construction parameter by o…
hmumtazz 8b7aa0e
Allows Customer to register model via CLI, fixed embedding generation…
hmumtazz 2635751
Sign-off on all previous work
hmumtazz 88cf68b
Enhance RAG pipeline functionality and user experience
hmumtazz ad1ea97
Created Ingest Pipline for chunking.- Merge custom setup.py with the …
hmumtazz da0ce7c
Removed hard coded LLM model, allowed for Opensoure integration, user…
hmumtazz fc2ace8
Remove requirements.txt and setup.py from Git tracking
hmumtazz fe83b25
Update setup.py file
hmumtazz bea75e0
Remove .gitignore file from rag pipeline directory
hmumtazz f5b74ed
Organized Model registration into classes
hmumtazz da834d2
Remove base_model.py from rag pipeline
hmumtazz 9c45e43
Licence header
hmumtazz a61a840
Updated user setup, Added Semantic search for Managed service using M…
hmumtazz 8db5e9f
fixed failing test in AIConnector tests
hmumtazz aa42922
Update test_SecretsHelper
hmumtazz 5984c19
fixed license header test_SageMakerModel.py
hmumtazz 4503a55
fixed license header rag.py
hmumtazz b1b98ab
Added seperate class for embedding generation, removed nominee text p…
hmumtazz d7e4713
Correctly leveraged existing methods in AI connector class, without h…
hmumtazz c02fe1b
Removed duplicate method, and deleted unused method
hmumtazz d877bca
Fixed chunking pipeline, query was not generating due to mismatchd ve…
hmumtazz f0194ec
Remove config.ini from repository and add to .gitignore
hmumtazz d354b25
Update rag_setup.py to remove neural search line and serverless
hmumtazz 1ca4221
Updated query.py to reflect search changes
hmumtazz 8b97791
Missing innit file
hmumtazz fe6f8c5
missing init file
hmumtazz 34f7fd4
Fixed domain ARN/domain name error, changed directory, for IAM, Secre…
hmumtazz 39a1883
Updated AIConnectorHelper to use existing methods, like get task, and…
hmumtazz 2dfa609
Updated License Headers
hmumtazz 2c40ce0
Fixed UT, Ran Lint nox, fixed code and unused enviroment variable iss…
hmumtazz 6a846ce
Deleted serverless.py from rag functionality as we are not using it
hmumtazz 6f17670
deleted unused dependcies, and test.py for serverless
hmumtazz 120169f
Fixed failing test
hmumtazz 91c9af0
Fixed tests
hmumtazz 82e69f5
Fixed tests
hmumtazz c0114c3
Addressed comments like making code less redudent, and combining methods
hmumtazz 197a134
Addressed code comments, making code less redundent, and combining ex…
hmumtazz 220e9ee
Fixed methods at UT's, adressed comments
hmumtazz ce56ce7
Fixed IAM Roled Helper arguement Issue
hmumtazz def3410
Fixed failing lint test
hmumtazz a500a56
Fixed sign off
hmumtazz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,276 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # The OpenSearch Contributors require contributions made to | ||
| # this file be licensed under the Apache-2.0 license or a | ||
| # compatible open source license. | ||
| # Any modifications Copyright OpenSearch Contributors. See | ||
| # GitHub history for details. | ||
|
|
||
| import json | ||
| import logging | ||
| import uuid | ||
| from datetime import datetime | ||
|
|
||
| import boto3 | ||
| from botocore.exceptions import ClientError | ||
|
|
||
|
|
||
| class IAMRoleHelper: | ||
| """ | ||
| Helper class for managing IAM roles and their interactions with OpenSearch. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| region, | ||
| opensearch_domain_url=None, | ||
| opensearch_domain_username=None, | ||
| opensearch_domain_password=None, | ||
| ): | ||
| """ | ||
| Initialize the IAMRoleHelper with AWS and OpenSearch configurations. | ||
|
|
||
| :param region: AWS region. | ||
| :param opensearch_domain_url: URL of the OpenSearch domain. | ||
| :param opensearch_domain_username: Username for OpenSearch domain authentication. | ||
| :param opensearch_domain_password: Password for OpenSearch domain authentication. | ||
| """ | ||
| self.region = region | ||
| self.opensearch_domain_url = opensearch_domain_url | ||
| self.opensearch_domain_username = opensearch_domain_username | ||
| self.opensearch_domain_password = opensearch_domain_password | ||
|
|
||
| self.iam_client = boto3.client("iam") | ||
| self.sts_client = boto3.client("sts", region_name=self.region) | ||
|
|
||
| def role_exists(self, role_name): | ||
| """ | ||
| Check if an IAM role exists. | ||
|
|
||
| :param role_name: Name of the IAM role. | ||
| :return: True if the role exists, False otherwise. | ||
| """ | ||
| try: | ||
| self.iam_client.get_role(RoleName=role_name) | ||
| return True | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"The requested role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return False | ||
|
|
||
| def delete_role(self, role_name): | ||
| """ | ||
| Delete an IAM role along with its attached policies. | ||
|
|
||
| :param role_name: Name of the IAM role to delete. | ||
| """ | ||
| try: | ||
| # Detach any managed policies from the role | ||
| policies = self.iam_client.list_attached_role_policies(RoleName=role_name)[ | ||
| "AttachedPolicies" | ||
| ] | ||
| for policy in policies: | ||
| self.iam_client.detach_role_policy( | ||
| RoleName=role_name, PolicyArn=policy["PolicyArn"] | ||
| ) | ||
| print(f"All managed policies detached from role '{role_name}'.") | ||
|
|
||
| # Delete inline policies associated with the role | ||
| inline_policies = self.iam_client.list_role_policies(RoleName=role_name)[ | ||
| "PolicyNames" | ||
| ] | ||
| for policy_name in inline_policies: | ||
| self.iam_client.delete_role_policy( | ||
| RoleName=role_name, PolicyName=policy_name | ||
| ) | ||
| print(f"All inline policies deleted from role '{role_name}'.") | ||
|
|
||
| # Finally, delete the IAM role | ||
| self.iam_client.delete_role(RoleName=role_name) | ||
| print(f"Role '{role_name}' deleted.") | ||
|
|
||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"Role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
|
|
||
| def create_iam_role( | ||
| self, | ||
| role_name, | ||
| trust_policy_json, | ||
| inline_policy_json, | ||
| policy_name=None, | ||
| ): | ||
| """ | ||
| Create a new IAM role with specified trust and inline policies. | ||
|
|
||
| :param role_name: Name of the IAM role to create. | ||
| :param trust_policy_json: Trust policy document in JSON format. | ||
| :param inline_policy_json: Inline policy document in JSON format. | ||
| :param policy_name: Optional. If not provided, a unique one will be generated. | ||
| :return: ARN of the created role or None if creation failed. | ||
| """ | ||
| try: | ||
| # Create the role with the provided trust policy | ||
| create_role_response = self.iam_client.create_role( | ||
| RoleName=role_name, | ||
| AssumeRolePolicyDocument=json.dumps(trust_policy_json), | ||
| Description="Role with custom trust and inline policies", | ||
| ) | ||
|
|
||
| # Retrieve the ARN of the newly created role | ||
| role_arn = create_role_response["Role"]["Arn"] | ||
|
|
||
| # If policy_name is not provided, generate a unique one | ||
| if not policy_name: | ||
| timestamp = datetime.utcnow().strftime("%Y%m%d%H%M%S") | ||
| policy_name = f"InlinePolicy-{role_name}-{timestamp}" | ||
|
|
||
| # Attach the inline policy to the role | ||
| self.iam_client.put_role_policy( | ||
| RoleName=role_name, | ||
| PolicyName=policy_name, | ||
| PolicyDocument=json.dumps(inline_policy_json), | ||
| ) | ||
|
|
||
| print(f"Created role: {role_name} with inline policy: {policy_name}") | ||
| return role_arn | ||
|
|
||
| except ClientError as e: | ||
| print(f"Error creating the role: {e}") | ||
| return None | ||
|
|
||
| def get_role_info(self, role_name, include_details=False): | ||
| """ | ||
| Retrieve information about an IAM role. | ||
|
|
||
| :param role_name: Name of the IAM role. | ||
| :param include_details: If False, returns only the role's ARN. | ||
| If True, returns a dictionary with full role details. | ||
| :return: ARN or dict of role details. Returns None if not found. | ||
| """ | ||
| if not role_name: | ||
| return None | ||
|
|
||
| try: | ||
| response = self.iam_client.get_role(RoleName=role_name) | ||
| role = response["Role"] | ||
| role_arn = role["Arn"] | ||
|
|
||
| if not include_details: | ||
| return role_arn | ||
|
|
||
| # Build a detailed dictionary | ||
| role_details = { | ||
| "RoleName": role["RoleName"], | ||
| "RoleId": role["RoleId"], | ||
| "Arn": role_arn, | ||
| "CreationDate": role["CreateDate"], | ||
| "AssumeRolePolicyDocument": role["AssumeRolePolicyDocument"], | ||
| "InlinePolicies": {}, | ||
| } | ||
|
|
||
| # List and retrieve any inline policies | ||
| list_role_policies_response = self.iam_client.list_role_policies( | ||
| RoleName=role_name | ||
| ) | ||
| for policy_name in list_role_policies_response["PolicyNames"]: | ||
| get_role_policy_response = self.iam_client.get_role_policy( | ||
| RoleName=role_name, PolicyName=policy_name | ||
| ) | ||
| role_details["InlinePolicies"][policy_name] = get_role_policy_response[ | ||
| "PolicyDocument" | ||
| ] | ||
|
|
||
| return role_details | ||
|
|
||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"Role '{role_name}' does not exist.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return None | ||
|
|
||
| def get_user_arn(self, username): | ||
| """ | ||
| Retrieve the ARN of an IAM user. | ||
|
|
||
| :param username: Name of the IAM user. | ||
| :return: ARN of the user or None if not found. | ||
| """ | ||
| if not username: | ||
| return None | ||
| try: | ||
| response = self.iam_client.get_user(UserName=username) | ||
| return response["User"]["Arn"] | ||
| except ClientError as e: | ||
| if e.response["Error"]["Code"] == "NoSuchEntity": | ||
| print(f"IAM user '{username}' not found.") | ||
| else: | ||
| print(f"An error occurred: {e}") | ||
| return None | ||
|
|
||
| def assume_role(self, role_arn, role_session_name=None, session=None): | ||
| """ | ||
| Assume an IAM role and obtain temporary security credentials. | ||
|
|
||
| :param role_arn: ARN of the IAM role to assume. | ||
| :param role_session_name: Identifier for the assumed role session. | ||
| :param session: Optional boto3 session object. Defaults to the class-level sts_client. | ||
| :return: Dictionary with temporary security credentials and metadata, or None on failure. | ||
| """ | ||
| if not role_arn: | ||
| logging.error("Role ARN is required.") | ||
| return None | ||
|
|
||
| sts_client = session.client("sts") if session else self.sts_client | ||
|
|
||
| role_session_name = role_session_name or f"session-{uuid.uuid4()}" | ||
|
|
||
| try: | ||
| assumed_role_object = sts_client.assume_role( | ||
| RoleArn=role_arn, | ||
| RoleSessionName=role_session_name, | ||
| ) | ||
|
|
||
| temp_credentials = assumed_role_object["Credentials"] | ||
| expiration = temp_credentials["Expiration"] | ||
|
|
||
| logging.info( | ||
| f"Assumed role: {role_arn}. Temporary credentials valid until: {expiration}" | ||
| ) | ||
|
|
||
| return { | ||
| "credentials": { | ||
| "AccessKeyId": temp_credentials["AccessKeyId"], | ||
| "SecretAccessKey": temp_credentials["SecretAccessKey"], | ||
| "SessionToken": temp_credentials["SessionToken"], | ||
| }, | ||
| "expiration": expiration, | ||
| "session_name": role_session_name, | ||
| } | ||
|
|
||
| except ClientError as e: | ||
| error_code = e.response["Error"]["Code"] | ||
| logging.error(f"Error assuming role {role_arn}: {error_code} - {e}") | ||
| return None | ||
|
|
||
| def get_iam_user_name_from_arn(self, iam_principal_arn): | ||
hmumtazz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Extract the IAM user name from an IAM principal ARN. | ||
|
|
||
| :param iam_principal_arn: ARN of the IAM principal. Expected format: arn:aws:iam::<account-id>:user/<user-name> | ||
| :return: IAM user name if extraction is successful, None otherwise. | ||
| """ | ||
| try: | ||
| if ( | ||
| iam_principal_arn | ||
| and iam_principal_arn.startswith("arn:aws:iam::") | ||
| and ":user/" in iam_principal_arn | ||
| ): | ||
| return iam_principal_arn.split(":user/")[-1] | ||
| except Exception as e: | ||
| print(f"Error extracting IAM user name: {e}") | ||
| return None | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # The OpenSearch Contributors require contributions made to | ||
| # this file be licensed under the Apache-2.0 license or a | ||
| # compatible open source license. | ||
| # Any modifications Copyright OpenSearch Contributors. See | ||
| # GitHub history for details. | ||
|
|
||
| import json | ||
| import logging | ||
|
|
||
| import boto3 | ||
| from botocore.exceptions import ClientError | ||
|
|
||
| # Configure the logger for this module | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class SecretHelper: | ||
| """ | ||
| Helper class for managing secrets in AWS Secrets Manager. | ||
| Provides methods to check existence, retrieve details, and create secrets. | ||
| """ | ||
|
|
||
| def __init__(self, region: str): | ||
| """ | ||
| Initialize the SecretHelper with the specified AWS region. | ||
| :param region: AWS region where the Secrets Manager is located. | ||
| """ | ||
| self.region = region | ||
| # Create the Secrets Manager client once at the class level | ||
| self.secretsmanager = boto3.client("secretsmanager", region_name=self.region) | ||
|
|
||
| def secret_exists(self, secret_name: str) -> bool: | ||
| """ | ||
| Check if a secret with the given name exists in AWS Secrets Manager. | ||
| :param secret_name: Name of the secret to check. | ||
| :return: True if the secret exists, False otherwise. | ||
| """ | ||
| try: | ||
| # Attempt to retrieve the secret value | ||
| self.secretsmanager.get_secret_value(SecretId=secret_name) | ||
| return True | ||
| except ClientError as e: | ||
| # If the secret does not exist, return False | ||
| if e.response["Error"]["Code"] == "ResourceNotFoundException": | ||
| return False | ||
| else: | ||
| # Log other client errors and return False | ||
| logger.error(f"An error occurred: {e}") | ||
| return False | ||
|
|
||
| def get_secret_details(self, secret_name: str, fetch_value: bool = False) -> dict: | ||
| """ | ||
| Retrieve details of a secret from AWS Secrets Manager. | ||
| Optionally fetch the secret value as well. | ||
|
|
||
| :param secret_name: Name of the secret. | ||
| :param fetch_value: Whether to also fetch the secret value (default is False). | ||
| :return: A dictionary with secret details (ARN and optionally the secret value) | ||
| or an error dictionary if something went wrong. | ||
| """ | ||
| try: | ||
| # Describe the secret to get its ARN and metadata | ||
| describe_response = self.secretsmanager.describe_secret( | ||
| SecretId=secret_name | ||
| ) | ||
|
|
||
| secret_details = { | ||
| "ARN": describe_response["ARN"], | ||
| # You can add more fields from `describe_response` if needed | ||
| } | ||
|
|
||
| # Fetch the secret value if requested | ||
| if fetch_value: | ||
| value_response = self.secretsmanager.get_secret_value( | ||
| SecretId=secret_name | ||
| ) | ||
| secret_details["SecretValue"] = value_response.get("SecretString") | ||
|
|
||
| return secret_details | ||
|
|
||
| except ClientError as e: | ||
| error_code = e.response["Error"]["Code"] | ||
| if error_code == "ResourceNotFoundException": | ||
| logger.warning(f"The requested secret '{secret_name}' was not found") | ||
| else: | ||
| logger.error( | ||
| f"An error occurred while fetching secret '{secret_name}': {e}" | ||
| ) | ||
| # Return a dictionary with error details | ||
| return {"error": str(e), "error_code": error_code} | ||
|
|
||
| def create_secret(self, secret_name: str, secret_value: dict) -> str: | ||
| """ | ||
| Create a new secret in AWS Secrets Manager. | ||
| :param secret_name: Name of the secret to create. | ||
| :param secret_value: Dictionary containing the secret data. | ||
| :return: ARN of the created secret if successful, None otherwise. | ||
| """ | ||
| try: | ||
| # Create the secret with the provided name and value | ||
| response = self.secretsmanager.create_secret( | ||
| Name=secret_name, | ||
| SecretString=json.dumps(secret_value), | ||
| ) | ||
| # Log success and return the secret's ARN | ||
| logger.info(f"Secret '{secret_name}' created successfully.") | ||
| return response["ARN"] | ||
| except ClientError as e: | ||
| # Log errors during secret creation and return None | ||
| logger.error(f"Error creating secret '{secret_name}': {e}") | ||
| return None |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.