-
Notifications
You must be signed in to change notification settings - Fork 50
[AQUA] Track md logs for error logging #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
agrimk
wants to merge
10
commits into
main
Choose a base branch
from
track_md_logs_for_error_logging
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+167
−76
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
0a2725c
watching md predict/access logs for error
agrimk c9e40b9
watching logs and pushing them to telemetry
agrimk 4e97417
watching logs and pushing them to telemetry
agrimk 011a369
Merge branch 'main' into track_md_logs_for_error_logging
agrimk 06ccb9f
replaced async call with sync call
agrimk b802894
added deployment object in get_deployment_status method
agrimk 95e5a04
merge from master
agrimk b60cab7
fixed unit test of get_deployment_status
agrimk ba43d62
Merge branch 'main' into track_md_logs_for_error_logging
agrimk 83dc9fc
added test cases and PR review comments
agrimk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,27 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8; -*- | ||
|
||
# Copyright (c) 2021, 2023 Oracle and/or its affiliates. | ||
# Copyright (c) 2021, 2025 Oracle and/or its affiliates. | ||
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/ | ||
|
||
|
||
import collections | ||
import copy | ||
import datetime | ||
import oci | ||
import warnings | ||
import time | ||
from typing import Dict, List, Union, Any | ||
import warnings | ||
from typing import Any, Dict, List, Union | ||
|
||
import oci | ||
import oci.loggingsearch | ||
from ads.common import auth as authutil | ||
import pandas as pd | ||
from ads.model.serde.model_input import JsonModelInputSERDE | ||
from oci.data_science.models import ( | ||
CreateModelDeploymentDetails, | ||
LogDetails, | ||
UpdateModelDeploymentDetails, | ||
) | ||
|
||
from ads.common import auth as authutil | ||
from ads.common import utils as ads_utils | ||
from ads.common.oci_logging import ( | ||
LOG_INTERVAL, | ||
LOG_RECORDS_LIMIT, | ||
|
@@ -30,10 +35,10 @@ | |
from ads.model.deployment.common.utils import send_request | ||
from ads.model.deployment.model_deployment_infrastructure import ( | ||
DEFAULT_BANDWIDTH_MBPS, | ||
DEFAULT_MEMORY_IN_GBS, | ||
DEFAULT_OCPUS, | ||
DEFAULT_REPLICA, | ||
DEFAULT_SHAPE_NAME, | ||
DEFAULT_OCPUS, | ||
DEFAULT_MEMORY_IN_GBS, | ||
MODEL_DEPLOYMENT_INFRASTRUCTURE_TYPE, | ||
ModelDeploymentInfrastructure, | ||
) | ||
|
@@ -45,18 +50,14 @@ | |
ModelDeploymentRuntimeType, | ||
OCIModelDeploymentRuntimeType, | ||
) | ||
from ads.model.serde.model_input import JsonModelInputSERDE | ||
from ads.model.service.oci_datascience_model_deployment import ( | ||
OCIDataScienceModelDeployment, | ||
) | ||
from ads.common import utils as ads_utils | ||
|
||
from .common import utils | ||
from .common.utils import State | ||
from .model_deployment_properties import ModelDeploymentProperties | ||
from oci.data_science.models import ( | ||
LogDetails, | ||
CreateModelDeploymentDetails, | ||
UpdateModelDeploymentDetails, | ||
) | ||
|
||
DEFAULT_WAIT_TIME = 1200 | ||
DEFAULT_POLL_INTERVAL = 10 | ||
|
@@ -751,6 +752,8 @@ def watch( | |
log_filter : str, optional | ||
Expression for filtering the logs. This will be the WHERE clause of the query. | ||
Defaults to None. | ||
status_list : List[str], optional | ||
List of status of model deployment. This is used to store list of status from logs. | ||
|
||
Returns | ||
------- | ||
|
@@ -964,7 +967,9 @@ def predict( | |
except oci.exceptions.ServiceError as ex: | ||
# When bandwidth exceeds the allocated value, TooManyRequests error (429) will be raised by oci backend. | ||
if ex.status == 429: | ||
bandwidth_mbps = self.infrastructure.bandwidth_mbps or DEFAULT_BANDWIDTH_MBPS | ||
bandwidth_mbps = ( | ||
self.infrastructure.bandwidth_mbps or DEFAULT_BANDWIDTH_MBPS | ||
) | ||
utils.get_logger().warning( | ||
f"Load balancer bandwidth exceeds the allocated {bandwidth_mbps} Mbps." | ||
"To estimate the actual bandwidth, use formula: (payload size in KB) * (estimated requests per second) * 8 / 1024." | ||
|
@@ -1644,22 +1649,22 @@ def _build_model_deployment_configuration_details(self) -> Dict: | |
} | ||
|
||
if infrastructure.subnet_id: | ||
instance_configuration[ | ||
infrastructure.CONST_SUBNET_ID | ||
] = infrastructure.subnet_id | ||
instance_configuration[infrastructure.CONST_SUBNET_ID] = ( | ||
infrastructure.subnet_id | ||
) | ||
|
||
if infrastructure.private_endpoint_id: | ||
if not hasattr( | ||
oci.data_science.models.InstanceConfiguration, "private_endpoint_id" | ||
): | ||
# TODO: add oci version with private endpoint support. | ||
raise EnvironmentError( | ||
raise OSError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just curious, did ruff suggest this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
"Private endpoint is not supported in the current OCI SDK installed." | ||
) | ||
|
||
instance_configuration[ | ||
infrastructure.CONST_PRIVATE_ENDPOINT_ID | ||
] = infrastructure.private_endpoint_id | ||
instance_configuration[infrastructure.CONST_PRIVATE_ENDPOINT_ID] = ( | ||
infrastructure.private_endpoint_id | ||
) | ||
|
||
scaling_policy = { | ||
infrastructure.CONST_POLICY_TYPE: "FIXED_SIZE", | ||
|
@@ -1704,7 +1709,7 @@ def _build_model_deployment_configuration_details(self) -> Dict: | |
oci.data_science.models, | ||
"ModelDeploymentEnvironmentConfigurationDetails", | ||
): | ||
raise EnvironmentError( | ||
raise OSError( | ||
"Environment variable hasn't been supported in the current OCI SDK installed." | ||
) | ||
|
||
|
@@ -1720,9 +1725,9 @@ def _build_model_deployment_configuration_details(self) -> Dict: | |
and runtime.inference_server.upper() | ||
== MODEL_DEPLOYMENT_INFERENCE_SERVER_TRITON | ||
): | ||
environment_variables[ | ||
"CONTAINER_TYPE" | ||
] = MODEL_DEPLOYMENT_INFERENCE_SERVER_TRITON | ||
environment_variables["CONTAINER_TYPE"] = ( | ||
MODEL_DEPLOYMENT_INFERENCE_SERVER_TRITON | ||
) | ||
runtime.set_spec(runtime.CONST_ENV, environment_variables) | ||
environment_configuration_details = { | ||
runtime.CONST_ENVIRONMENT_CONFIG_TYPE: runtime.environment_config_type, | ||
|
@@ -1734,17 +1739,17 @@ def _build_model_deployment_configuration_details(self) -> Dict: | |
oci.data_science.models, | ||
"OcirModelDeploymentEnvironmentConfigurationDetails", | ||
): | ||
raise EnvironmentError( | ||
raise OSError( | ||
"Container runtime hasn't been supported in the current OCI SDK installed." | ||
) | ||
environment_configuration_details["image"] = runtime.image | ||
environment_configuration_details["imageDigest"] = runtime.image_digest | ||
environment_configuration_details["cmd"] = runtime.cmd | ||
environment_configuration_details["entrypoint"] = runtime.entrypoint | ||
environment_configuration_details["serverPort"] = runtime.server_port | ||
environment_configuration_details[ | ||
"healthCheckPort" | ||
] = runtime.health_check_port | ||
environment_configuration_details["healthCheckPort"] = ( | ||
runtime.health_check_port | ||
) | ||
|
||
model_deployment_configuration_details = { | ||
infrastructure.CONST_DEPLOYMENT_TYPE: "SINGLE_MODEL", | ||
|
@@ -1754,7 +1759,7 @@ def _build_model_deployment_configuration_details(self) -> Dict: | |
|
||
if runtime.deployment_mode == ModelDeploymentMode.STREAM: | ||
if not hasattr(oci.data_science.models, "StreamConfigurationDetails"): | ||
raise EnvironmentError( | ||
raise OSError( | ||
"Model deployment mode hasn't been supported in the current OCI SDK installed." | ||
) | ||
model_deployment_configuration_details[ | ||
|
@@ -1786,9 +1791,13 @@ def _build_category_log_details(self) -> Dict: | |
|
||
logs = {} | ||
if ( | ||
self.infrastructure.access_log and | ||
self.infrastructure.access_log.get(self.infrastructure.CONST_LOG_GROUP_ID, None) | ||
and self.infrastructure.access_log.get(self.infrastructure.CONST_LOG_ID, None) | ||
self.infrastructure.access_log | ||
and self.infrastructure.access_log.get( | ||
self.infrastructure.CONST_LOG_GROUP_ID, None | ||
) | ||
and self.infrastructure.access_log.get( | ||
self.infrastructure.CONST_LOG_ID, None | ||
) | ||
): | ||
logs[self.infrastructure.CONST_ACCESS] = { | ||
self.infrastructure.CONST_LOG_GROUP_ID: self.infrastructure.access_log.get( | ||
|
@@ -1799,9 +1808,13 @@ def _build_category_log_details(self) -> Dict: | |
), | ||
} | ||
if ( | ||
self.infrastructure.predict_log and | ||
self.infrastructure.predict_log.get(self.infrastructure.CONST_LOG_GROUP_ID, None) | ||
and self.infrastructure.predict_log.get(self.infrastructure.CONST_LOG_ID, None) | ||
self.infrastructure.predict_log | ||
and self.infrastructure.predict_log.get( | ||
self.infrastructure.CONST_LOG_GROUP_ID, None | ||
) | ||
and self.infrastructure.predict_log.get( | ||
self.infrastructure.CONST_LOG_ID, None | ||
) | ||
): | ||
logs[self.infrastructure.CONST_PREDICT] = { | ||
self.infrastructure.CONST_LOG_GROUP_ID: self.infrastructure.predict_log.get( | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any reason why we're removing the non-alphanumeric characters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without removing the non-alphanumeric characters, I was getting bad request when calling the head_object endpoint.