Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add vertex_search_evaluation.ipynb notebook #1584

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

lazyprgmr
Copy link
Contributor

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
  • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

lazyprgmr and others added 26 commits November 8, 2024 10:21
…fixed KeyError issue when a section is not generated
@lazyprgmr lazyprgmr requested a review from a team as a code owner December 27, 2024 09:47
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:

This pull request adds a new Jupyter notebook, vertex_search_evaluation.ipynb, which demonstrates how to evaluate search results for Vertex Agent Builder Search. The notebook covers several key aspects:

  • Data Acquisition: Downloads SEO data from Kaggle using the opendatasets library.
  • Data Preprocessing: Selects a subset of the data for demonstration purposes.
  • Datastore Creation: Creates a Vertex AI Datastore to store the search data using the google-cloud-discoveryengine library. This includes specifying the datastore's display name, industry vertical, and solution type.
  • Data Import: Imports the preprocessed data into the created datastore. This involves removing protocol information from URIs and using batch requests for efficiency.
  • Query Set Creation and Import: Creates a sample query set and imports sample query data from a GCS bucket in NDJSON format.
  • Search Quality Evaluation: Initiates a search quality evaluation using the created query set and a specified search application ID. This requires a pre-existing search application connected to the datastore.
  • Result Retrieval: Retrieves both aggregate and query-level results from the evaluation.

The notebook is well-structured and provides clear instructions and comments. It leverages several Google Cloud libraries, demonstrating a practical workflow for evaluating search results within the Vertex AI ecosystem.

Here's a little haiku to brighten your day:

Code flows like a stream,
Tests pass, a joyful green light,
Review's swift and kind.

Thanks,
Gemini

@lazyprgmr lazyprgmr changed the title Vertex Agent Builder Search evaluation demo notebook feat: Add vertex_search_evaluation.ipynb notebook Dec 27, 2024
Copy link
Contributor

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the file name to be vertex_ai_search_evaluation

Comment on lines +71 to +73
"| | | | |\n",
"|-|-|-|-|\n",
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"| | | | |\n",
"|-|-|-|-|\n",
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr)"
"| | |\n",
"|-|-|\n",
"|Author(s) | [Rupjit Chakraborty](https://github.com/lazyprgmr) |"

"# Instructions for enabling the search api\n",
"\n",
"\n",
"def create_data_store() -> str:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pass values as parameters instead of using global constants where possible?

Comment on lines +511 to +532
"# For more information, refer to:\n",
"# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store\n",
"client_options = (\n",
" ClientOptions(api_endpoint=f\"{location}-discoveryengine.googleapis.com\")\n",
" if DS_LOC != \"global\"\n",
" else None\n",
")\n",
"\n",
"# Create a client\n",
"client = discoveryengine_v1.SiteSearchEngineServiceClient(client_options=client_options)\n",
"\n",
"# The full resource name of the data store\n",
"# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}\n",
"site_search_engine = client.site_search_engine_path(\n",
" project=PROJECT_ID, location=DS_LOC, data_store=DATA_STORE_ID\n",
")\n",
"\n",
"\n",
"target_sites = []\n",
"for uri_pattern in uri_patterns:\n",
" # Remove the protocol information\n",
" if uri_pattern.startswith(\"https://\"):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap this in a method like you did with creating the data store?

Comment on lines +630 to +632
" query_set = {\"queryEntry\": {\"query\": \"\", \"targets\": []}}\n",
" query_set[\"queryEntry\"][\"query\"] = question\n",
" query_set[\"queryEntry\"][\"targets\"] = [{\"uri\": uri} for uri in uris]\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be condensed to make it easier to read

Comment on lines +708 to +720
"headers = {\n",
" \"Authorization\": f\"Bearer {get_access_token()}\",\n",
" \"Content-Type\": \"application/json\",\n",
" \"X-Goog-User-Project\": PROJECT_ID,\n",
"}\n",
"query_set_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets?sampleQuerySetId={QUERY_SET_ID}\"\n",
"data = {\"displayName\": DISPLAY_NAME}\n",
"query_set_creation_response = requests.post(\n",
" url=query_set_uri,\n",
" headers=headers,\n",
" json=data,\n",
")\n",
"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be methods in the client library for this

Comment on lines +751 to +763
"query_set_import_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets/{QUERY_SET_ID}/sampleQueries:import\"\n",
"data = {\n",
" \"gcsSource\": {\n",
" \"inputUris\": [f\"{BUCKET_URI}/test_data.ndjson\"],\n",
" },\n",
" \"errorConfig\": {\"gcsPrefix\": f\"{BUCKET_URI}/errors/\"},\n",
"}\n",
"sample_data_import_response = requests.post(\n",
" url=query_set_import_uri,\n",
" headers=headers,\n",
" json=data,\n",
")\n",
"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment on lines +839 to +851
"headers = {\n",
" \"Authorization\": f\"Bearer {get_access_token()}\",\n",
" \"Content-Type\": \"application/json\",\n",
" \"X-Goog-User-Project\": PROJECT_ID,\n",
"}\n",
"search_evaluation_uri = f\"https://discoveryengine.googleapis.com/v1beta/projects/{PROJECT_ID}/locations/{DS_LOC}/evaluations\"\n",
"data = {\n",
" \"evaluationSpec\": {\n",
" \"querySetSpec\": {\n",
" \"sampleQuerySet\": f\"projects/{PROJECT_ID}/locations/{DS_LOC}/sampleQuerySets/{QUERY_SET_ID}\"\n",
" },\n",
" \"searchRequest\": {\n",
" \"servingConfig\": f\"projects/{PROJECT_ID}/locations/{DS_LOC}/collections/default_collection/engines/{APP_ID}/servingConfigs/default_search\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it's not recommended to use raw REST calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants