-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add vertex_search_evaluation.ipynb
notebook
#1584
base: main
Are you sure you want to change the base?
Conversation
…d changes as per comments in the PR
…unction parameters
…fixed KeyError issue when a section is not generated
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request adds a new Jupyter notebook, vertex_search_evaluation.ipynb
, which demonstrates how to evaluate search results for Vertex Agent Builder Search. The notebook covers several key aspects:
- Data Acquisition: Downloads SEO data from Kaggle using the
opendatasets
library. - Data Preprocessing: Selects a subset of the data for demonstration purposes.
- Datastore Creation: Creates a Vertex AI Datastore to store the search data using the
google-cloud-discoveryengine
library. This includes specifying the datastore's display name, industry vertical, and solution type. - Data Import: Imports the preprocessed data into the created datastore. This involves removing protocol information from URIs and using batch requests for efficiency.
- Query Set Creation and Import: Creates a sample query set and imports sample query data from a GCS bucket in NDJSON format.
- Search Quality Evaluation: Initiates a search quality evaluation using the created query set and a specified search application ID. This requires a pre-existing search application connected to the datastore.
- Result Retrieval: Retrieves both aggregate and query-level results from the evaluation.
The notebook is well-structured and provides clear instructions and comments. It leverages several Google Cloud libraries, demonstrating a practical workflow for evaluating search results within the Vertex AI ecosystem.
Here's a little haiku to brighten your day:
Code flows like a stream,
Tests pass, a joyful green light,
Review's swift and kind.
Thanks,
Gemini
vertex_search_evaluation.ipynb
notebook
Gemini encountered an error creating the review. You can try again by commenting |
… raised by spellbot
…ng issues raised by spellbot
…ng exception in allow.txt
…ng exception in allow.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the file name to be vertex_ai_search_evaluation
"| | | | |\n", | ||
"|-|-|-|-|\n", | ||
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"| | | | |\n", | |
"|-|-|-|-|\n", | |
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr)" | |
"| | |\n", | |
"|-|-|\n", | |
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr) |" |
"# Instructions for enabling the search api\n", | ||
"\n", | ||
"\n", | ||
"def create_data_store() -> str:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pass values as parameters instead of using global constants where possible?
"# For more information, refer to:\n", | ||
"# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store\n", | ||
"client_options = (\n", | ||
" ClientOptions(api_endpoint=f\"{location}-discoveryengine.googleapis.com\")\n", | ||
" if DS_LOC != \"global\"\n", | ||
" else None\n", | ||
")\n", | ||
"\n", | ||
"# Create a client\n", | ||
"client = discoveryengine_v1.SiteSearchEngineServiceClient(client_options=client_options)\n", | ||
"\n", | ||
"# The full resource name of the data store\n", | ||
"# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}\n", | ||
"site_search_engine = client.site_search_engine_path(\n", | ||
" project=PROJECT_ID, location=DS_LOC, data_store=DATA_STORE_ID\n", | ||
")\n", | ||
"\n", | ||
"\n", | ||
"target_sites = []\n", | ||
"for uri_pattern in uri_patterns:\n", | ||
" # Remove the protocol information\n", | ||
" if uri_pattern.startswith(\"https://\"):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you wrap this in a method like you did with creating the data store?
" query_set = {\"queryEntry\": {\"query\": \"\", \"targets\": []}}\n", | ||
" query_set[\"queryEntry\"][\"query\"] = question\n", | ||
" query_set[\"queryEntry\"][\"targets\"] = [{\"uri\": uri} for uri in uris]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be condensed to make it easier to read
"headers = {\n", | ||
" \"Authorization\": f\"Bearer {get_access_token()}\",\n", | ||
" \"Content-Type\": \"application/json\",\n", | ||
" \"X-Goog-User-Project\": PROJECT_ID,\n", | ||
"}\n", | ||
"query_set_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets?sampleQuerySetId={QUERY_SET_ID}\"\n", | ||
"data = {\"displayName\": DISPLAY_NAME}\n", | ||
"query_set_creation_response = requests.post(\n", | ||
" url=query_set_uri,\n", | ||
" headers=headers,\n", | ||
" json=data,\n", | ||
")\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be methods in the client library for this
"query_set_import_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets/{QUERY_SET_ID}/sampleQueries:import\"\n", | ||
"data = {\n", | ||
" \"gcsSource\": {\n", | ||
" \"inputUris\": [f\"{BUCKET_URI}/test_data.ndjson\"],\n", | ||
" },\n", | ||
" \"errorConfig\": {\"gcsPrefix\": f\"{BUCKET_URI}/errors/\"},\n", | ||
"}\n", | ||
"sample_data_import_response = requests.post(\n", | ||
" url=query_set_import_uri,\n", | ||
" headers=headers,\n", | ||
" json=data,\n", | ||
")\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
"headers = {\n", | ||
" \"Authorization\": f\"Bearer {get_access_token()}\",\n", | ||
" \"Content-Type\": \"application/json\",\n", | ||
" \"X-Goog-User-Project\": PROJECT_ID,\n", | ||
"}\n", | ||
"search_evaluation_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/evaluations\"\n", | ||
"data = {\n", | ||
" \"evaluationSpec\": {\n", | ||
" \"querySetSpec\": {\n", | ||
" \"sampleQuerySet\": f\"projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets/{QUERY_SET_ID}\"\n", | ||
" },\n", | ||
" \"searchRequest\": {\n", | ||
" \"servingConfig\": f\"projects/{PROJECT_ID}/locations/{DS_LOC}/collections/default_collection/engines/{APP_ID}/servingConfigs/default_search\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, it's not recommended to use raw REST calls
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).