From 8a4521a67aecf027da6132a00fccefa84c4f61b9 Mon Sep 17 00:00:00 2001
From: Ryan Lin <linjinhong@yandex.com>
Date: Fri, 1 Nov 2024 10:34:41 -0400
Subject: [PATCH] milvus tutorials refactored: migrate to MilvusClient SDK for
 simplified operations

---
 authors.yaml                                  |    8 +-
 ...ltered_search_with_Milvus_and_OpenAI.ipynb | 1681 ++++++++++++++---
 ...tting_started_with_Milvus_and_OpenAI.ipynb |  783 ++++----
 registry.yaml                                 |   22 +-
 4 files changed, 1796 insertions(+), 698 deletions(-)

diff --git a/authors.yaml b/authors.yaml
index 9a57771b7e..6bb91d17a9 100644
--- a/authors.yaml
+++ b/authors.yaml
@@ -188,7 +188,7 @@ danial-openai:
   website: "https://github.com/danial-openai"
   avatar: "https://avatars.githubusercontent.com/u/178343703"
 
-gbergengruen:
-  name: "Guillermo Bergengruen"
-  website: "https://github.com/gbergengruen"
-  avatar: "https://avatars.githubusercontent.com/u/140010883"
+jinhonglin-ryan:
+  name: "Jinhong Lin"
+  website: "https://github.com/jinhonglin-ryan"
+  avatar: "https://avatars.githubusercontent.com/u/123346659"
diff --git a/examples/vector_databases/milvus/Filtered_search_with_Milvus_and_OpenAI.ipynb b/examples/vector_databases/milvus/Filtered_search_with_Milvus_and_OpenAI.ipynb
index e6aadfb3cd..53f0dfad13 100644
--- a/examples/vector_databases/milvus/Filtered_search_with_Milvus_and_OpenAI.ipynb
+++ b/examples/vector_databases/milvus/Filtered_search_with_Milvus_and_OpenAI.ipynb
@@ -1,419 +1,504 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "wg1bdO0lDIv6"
+   },
    "source": [
-    "# Filtered Search with Milvus and OpenAI\n",
-    "### Finding your next movie\n",
+    "# Movie Recommendation with Milvus\n",
+    "\n",
+    "[Milvus](https://milvus.io/) is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search.\n",
     "\n",
-    "In this notebook we will be going over generating embeddings of movie descriptions with OpenAI and using those embeddings within Milvus to find relevant movies. To narrow our search results and try something new, we are going to be using filtering to do metadata searches. The dataset in this example is sourced from HuggingFace datasets, and contains a little over 8 thousand movie entries.\n",
     "\n",
-    "Lets begin by first downloading the required libraries for this notebook:\n",
-    "- `openai` is used for communicating with the OpenAI embedding service\n",
-    "- `pymilvus` is used for communicating with the Milvus server\n",
-    "- `datasets` is used for downloading the dataset\n",
-    "- `tqdm` is used for the progress bars\n"
+    "In this notebook, we will explore how to generate embeddings of movie descriptions using OpenAI and leverage those embeddings within Milvus to recommend movies that match your preferences. To enhance our search results, we will utilize filtering to perform metadata searches. The dataset used in this example is sourced from HuggingFace datasets and contains over 8,000 movie entries, providing a rich pool of options for movie recommendations.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Gv0NJ31IDIv7"
+   },
+   "source": [
+    "## Dependencies and Environment\n",
+    "You can install the dependencies by running the following command:\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true,
+    "id": "kdxHDsewDIv7",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "a38ce11b-f668-46ef-9ce2-38fd7d12ba1d"
+   },
    "outputs": [],
    "source": [
     "! pip install openai pymilvus datasets tqdm"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "_sbPlC7tDIv7"
+   },
    "source": [
-    "With the required packages installed we can get started. Lets begin by launching the Milvus service. The file being run is the `docker-compose.yaml` found in the folder of this file. This command launches a Milvus standalone instance which we will use for this test.  "
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "08DQFod0DIv7"
+   },
+   "source": [
+    "We will use OpenAI as the LLM in this example. You should prepare the [api key](https://platform.openai.com/docs/quickstart) `OPENAI_API_KEY` as an environment variable."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "E0317 14:06:38.344884000 140704629352640 fork_posix.cc:76]             Other threads are currently calling into gRPC, skipping fork() handlers\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[1A\u001b[1B\u001b[0G\u001b[?25l[+] Running 1/0\n",
-      "\u001b[34m ⠿ Network milvus          Created                                         0.1s\n",
-      "\u001b[0m\u001b[37m ⠋ Container milvus-etcd   Creating                                        0.0s\n",
-      "\u001b[0m\u001b[37m ⠋ Container milvus-minio  Creating                                        0.0s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
-      "\u001b[34m ⠿ Network milvus          Created                                         0.1s\n",
-      "\u001b[0m\u001b[37m ⠙ Container milvus-etcd   Creating                                        0.1s\n",
-      "\u001b[0m\u001b[37m ⠙ Container milvus-minio  Creating                                        0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/3\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.2s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.2s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.3s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.4s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.4s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.5s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.5s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.6s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.6s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.7s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.8s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.9s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.0s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.0s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.2s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.4s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.5s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 4/4\u001b[0m\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.0s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Started                                    1.6s\n",
-      "\u001b[0m\u001b[?25h"
-     ]
+   "execution_count": 1,
+   "metadata": {
+    "id": "E8LsgeBzDIv8",
+    "ExecuteTime": {
+     "end_time": "2024-10-28T03:58:17.790191Z",
+     "start_time": "2024-10-28T03:58:17.785125Z"
     }
-   ],
+   },
+   "outputs": [],
    "source": [
-    "! docker compose up -d"
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "aJcCzQhODIv8"
+   },
    "source": [
-    "With Milvus running we can setup our global variables:\n",
-    "- HOST: The Milvus host address\n",
-    "- PORT: The Milvus port number\n",
-    "- COLLECTION_NAME: What to name the collection within Milvus\n",
-    "- DIMENSION: The dimension of the embeddings\n",
-    "- OPENAI_ENGINE: Which embedding model to use\n",
-    "- openai.api_key: Your OpenAI account key\n",
-    "- INDEX_PARAM: The index settings to use for the collection\n",
-    "- QUERY_PARAM: The search parameters to use\n",
-    "- BATCH_SIZE: How many movies to embed and insert at once"
+    "## Initialize OpenAI client and Milvus\n",
+    "Initialize the OpenAI client."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
-   "metadata": {},
+   "execution_count": 2,
+   "metadata": {
+    "id": "aBOi2mvKDIv8",
+    "ExecuteTime": {
+     "end_time": "2024-10-28T03:58:19.501086Z",
+     "start_time": "2024-10-28T03:58:19.049909Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "import openai\n",
+    "from openai import OpenAI\n",
     "\n",
-    "HOST = 'localhost'\n",
-    "PORT = 19530\n",
-    "COLLECTION_NAME = 'movie_search'\n",
+    "openai_client = OpenAI()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "9IO_ts7pDIv8"
+   },
+   "source": [
+    "Set the collection name and dimension for the embeddings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-10-28T02:52:31.594866Z",
+     "start_time": "2024-10-28T02:52:31.584663Z"
+    },
+    "id": "4rXlQ29MDIv8"
+   },
+   "outputs": [],
+   "source": [
+    "COLLECTION_NAME = \"movie_search\"\n",
     "DIMENSION = 1536\n",
-    "OPENAI_ENGINE = 'text-embedding-3-small'\n",
-    "openai.api_key = 'sk-your_key'\n",
-    "\n",
-    "INDEX_PARAM = {\n",
-    "    'metric_type':'L2',\n",
-    "    'index_type':\"HNSW\",\n",
-    "    'params':{'M': 8, 'efConstruction': 64}\n",
-    "}\n",
-    "\n",
-    "QUERY_PARAM = {\n",
-    "    \"metric_type\": \"L2\",\n",
-    "    \"params\": {\"ef\": 64},\n",
-    "}\n",
     "\n",
     "BATCH_SIZE = 1000"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OHuz5DtVDIv8"
+   },
+   "source": [
+    "Connect to Milvus."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "VUXhaRtQDIv8",
+    "outputId": "ac2c222b-9af3-4b64-d8dd-cf5b3fe2f376"
+   },
    "outputs": [],
    "source": [
-    "from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType\n",
+    "from pymilvus import MilvusClient\n",
     "\n",
     "# Connect to Milvus Database\n",
-    "connections.connect(host=HOST, port=PORT)"
+    "client = MilvusClient(\"./milvus_demo.db\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "U8es09T2DIv8"
+   },
+   "source": [
+    "> As for the argument of `url` and `token`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, say more than a million vectors, you can set up a more performant Milvus server on [Docker or Kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server address and port as your uri, e.g.`http://localhost:19530`. If you enable the authentication feature on Milvus, use \"<your_username>:<your_password>\" as the token, otherwise don't set the token.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-10-28T02:52:32.123914Z",
+     "start_time": "2024-10-28T02:52:32.116018Z"
+    },
+    "id": "eaBFyWQPDIv8"
+   },
    "outputs": [],
    "source": [
     "# Remove collection if it already exists\n",
-    "if utility.has_collection(COLLECTION_NAME):\n",
-    "    utility.drop_collection(COLLECTION_NAME)"
+    "if client.has_collection(COLLECTION_NAME):\n",
+    "    client.drop_collection(COLLECTION_NAME)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "rmt5EIn8DIv8"
+   },
+   "source": [
+    "Define the fields for the collection, which include the id, title, type, release year, rating, and description."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "xf5YMsDmDIv8",
+    "outputId": "b26cb009-8a82-4cfd-9d76-db6323d0df69"
+   },
    "outputs": [],
    "source": [
+    "from pymilvus import DataType\n",
+    "\n",
     "# Create collection which includes the id, title, and embedding.\n",
-    "fields = [\n",
-    "    FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),\n",
-    "    FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='type', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='release_year', dtype=DataType.INT64),\n",
-    "    FieldSchema(name='rating', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)\n",
-    "]\n",
-    "schema = CollectionSchema(fields=fields)\n",
-    "collection = Collection(name=COLLECTION_NAME, schema=schema)"
+    "\n",
+    "# 1. Create schema\n",
+    "schema = MilvusClient.create_schema(\n",
+    "    auto_id=True,\n",
+    "    enable_dynamic_field=False,\n",
+    ")\n",
+    "\n",
+    "# 2. Add fields to schema\n",
+    "schema.add_field(field_name=\"id\", datatype=DataType.INT64, is_primary=True)\n",
+    "schema.add_field(field_name=\"title\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"type\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"release_year\", datatype=DataType.INT64)\n",
+    "schema.add_field(field_name=\"rating\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"description\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"embedding\", datatype=DataType.FLOAT_VECTOR, dim=DIMENSION)\n",
+    "\n",
+    "# 3. Create collection with the schema\n",
+    "client.create_collection(collection_name=COLLECTION_NAME, schema=schema)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5Ch-2J5CDIv8"
+   },
+   "source": [
+    "Create the index on the collection and load it."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "K9fwi332DIv8",
+    "outputId": "d9e1e36a-4830-491f-ce3b-be1974a363cf"
+   },
    "outputs": [],
    "source": [
     "# Create the index on the collection and load it.\n",
-    "collection.create_index(field_name=\"embedding\", index_params=INDEX_PARAM)\n",
-    "collection.load()"
+    "\n",
+    "# 1. Prepare index parameters\n",
+    "index_params = client.prepare_index_params()\n",
+    "\n",
+    "\n",
+    "# 2. Add an index on the embedding field\n",
+    "index_params.add_index(\n",
+    "    field_name=\"embedding\", metric_type=\"IP\", index_type=\"AUTOINDEX\", params={}\n",
+    ")\n",
+    "\n",
+    "\n",
+    "# 3. Create index\n",
+    "client.create_index(collection_name=COLLECTION_NAME, index_params=index_params)\n",
+    "\n",
+    "\n",
+    "# 4. Load collection\n",
+    "client.load_collection(collection_name=COLLECTION_NAME, replica_number=1)"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "L-d-pR5DDIv8"
+   },
    "source": [
     "## Dataset\n",
-    "With Milvus up and running we can begin grabbing our data. Hugging Face Datasets is a hub that holds many different user datasets, and for this example we are using HuggingLearners's netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8 thousand movies. We are going to embed each description and store it within Milvus along with its title, type, release_year and rating."
+    "With Milvus up and running we can begin grabbing our data. `Hugging Face Datasets` is a hub that holds many different user datasets, and for this example we are using HuggingLearners's netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8 thousand movies. We are going to embed each description and store it within Milvus along with its title, type, release_year and rating."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Found cached dataset csv (/Users/filiphaltmayer/.cache/huggingface/datasets/hugginglearners___csv/hugginglearners--netflix-shows-03475319fc65a05a/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)\n"
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 220,
+     "referenced_widgets": [
+      "7364d39079fa40c5b5e5abccd2d006b6",
+      "4f8b5d0ad9f942c59b386d6b5f07242e",
+      "7db1a3f59362417586d8739cb791207a",
+      "24517a7598c84e3dbc2864e702fccb86",
+      "3b004c8bc2b147b48d2807262322de40",
+      "d17551bb40fc4a2fbe2a1628e32cab11",
+      "aabf31d74d7b425798c68b83d1c025d8",
+      "d8f8541809964b2a9b2003092b1e1dad",
+      "4ccca81c7b714e75929822ed57cab5d5",
+      "2d941555097e4d49a01b4e240d0ab18f",
+      "81c4aa500d6449b1ac6dcd5ea6365630",
+      "074d1bb0263442c082a83e13cc77a264",
+      "9beb8aa11477425498859e544100b109",
+      "1608f9db979346d9bb294e564b507fce",
+      "971e5d9d61c44905b1f1704178925db1",
+      "7ade9ecc4c2c4bda8ca175dbffedbb12",
+      "0c5a98cdee664622a50c92be5ef4631f",
+      "e7069429122b48208c760be1ee5bcb9f",
+      "4e41b396f14941508899975184e07a71",
+      "a8796c17d961465fa141933ff82cf7d8",
+      "7a231c8d89b844b295ccb656c17d9464",
+      "6ea032bb9ca148b7885bc4cba9c17525",
+      "0d2ccc863fad437f8138fa0b32938f84",
+      "c8cef285e18f4b558ae6a6a4ad5aee92",
+      "2cc3a2b72f154656b8e2c3acc20146b5",
+      "620d6ed5d68f48b281b5f2200dfc7d0b",
+      "537bb70db0814a43aff05c4b2f712322",
+      "5860b6c65c1b4f5aa486e7815e2aecfe",
+      "f326231ed25749f1b81fb0b6fca578f5",
+      "1d992ac5fb2b43f4a9f791730bdea123",
+      "de31aa5b5e4845be980f550a4b2bea91",
+      "eb0436e824f94281af20d69dfcec3ea4",
+      "45a3e27ec99f4685913f2f5f77b846ef"
      ]
-    }
-   ],
+    },
+    "id": "7pyst3ZBDIv8",
+    "outputId": "4f477b23-f190-4a36-905a-3884e40e70b9"
+   },
+   "outputs": [],
    "source": [
-    "import datasets\n",
+    "from datasets import load_dataset\n",
     "\n",
-    "# Download the dataset \n",
-    "dataset = datasets.load_dataset('hugginglearners/netflix-shows', split='train')"
+    "dataset = load_dataset(\"hugginglearners/netflix-shows\", split=\"train\")"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "1-iOxwCMDIv8"
+   },
    "source": [
     "## Insert the Data\n",
-    "Now that we have our data on our machine we can begin embedding it and inserting it into Milvus. The embedding function takes in text and returns the embeddings in a list format. "
+    "Now that we have our data on our machine we can begin embedding it and inserting it into Milvus. The embedding function takes in text and returns the embeddings in a list format."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
+   "execution_count": 9,
+   "metadata": {
+    "id": "YBi6ria2DIv8"
+   },
    "outputs": [],
    "source": [
-    "# Simple function that converts the texts to embeddings\n",
-    "def embed(texts):\n",
-    "    embeddings = openai.Embedding.create(\n",
-    "        input=texts,\n",
-    "        engine=OPENAI_ENGINE\n",
-    "    )\n",
-    "    return [x['embedding'] for x in embeddings['data']]\n"
+    "def emb_texts(texts):\n",
+    "    res = openai_client.embeddings.create(input=texts, model=\"text-embedding-3-small\")\n",
+    "    return [res_data.embedding for res_data in res.data]"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "D-SJAeQXDIv8"
+   },
    "source": [
-    "This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists. "
+    "This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 8807/8807 [00:31<00:00, 276.82it/s]\n"
-     ]
-    }
-   ],
    "source": [
     "from tqdm import tqdm\n",
     "\n",
-    "data = [\n",
-    "    [], # title\n",
-    "    [], # type\n",
-    "    [], # release_year\n",
-    "    [], # rating\n",
-    "    [], # description\n",
-    "]\n",
+    "# batch (data to be inserted) is a list of dictionaries\n",
+    "batch = []\n",
     "\n",
     "# Embed and insert in batches\n",
     "for i in tqdm(range(0, len(dataset))):\n",
-    "    data[0].append(dataset[i]['title'] or '')\n",
-    "    data[1].append(dataset[i]['type'] or '')\n",
-    "    data[2].append(dataset[i]['release_year'] or -1)\n",
-    "    data[3].append(dataset[i]['rating'] or '')\n",
-    "    data[4].append(dataset[i]['description'] or '')\n",
-    "    if len(data[0]) % BATCH_SIZE == 0:\n",
-    "        data.append(embed(data[4]))\n",
-    "        collection.insert(data)\n",
-    "        data = [[],[],[],[],[]]\n",
-    "\n",
-    "# Embed and insert the remainder \n",
-    "if len(data[0]) != 0:\n",
-    "    data.append(embed(data[4]))\n",
-    "    collection.insert(data)\n",
-    "    data = [[],[],[],[],[]]\n"
-   ]
+    "    batch.append(\n",
+    "        {\n",
+    "            \"title\": dataset[i][\"title\"] or \"\",\n",
+    "            \"type\": dataset[i][\"type\"] or \"\",\n",
+    "            \"release_year\": dataset[i][\"release_year\"] or -1,\n",
+    "            \"rating\": dataset[i][\"rating\"] or \"\",\n",
+    "            \"description\": dataset[i][\"description\"] or \"\",\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "    if len(batch) % BATCH_SIZE == 0 or i == len(dataset) - 1:\n",
+    "        embeddings = emb_texts([item[\"description\"] for item in batch])\n",
+    "\n",
+    "        for item, emb in zip(batch, embeddings):\n",
+    "            item[\"embedding\"] = emb\n",
+    "\n",
+    "        client.insert(collection_name=COLLECTION_NAME, data=batch)\n",
+    "        batch = []"
+   ],
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "hMIpsCJUDrO3",
+    "outputId": "957f4502-dc50-49cc-c534-f447a7451445"
+   },
+   "execution_count": null,
+   "outputs": []
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "X3pSr28CDIv9"
+   },
    "source": [
     "## Query the Database\n",
-    "With our data safely inserted in Milvus, we can now perform a query. The query takes in a tuple of the movie description you are searching for an the filter to use. More info about the filter can be found [here](https://milvus.io/docs/boolean.md). The search first prints out your description and filter expression. After that for each result we print the score, title, type, release year, rating, and description of the result movies. "
+    "With our data safely inserted into Milvus, we can now perform a query. The query takes in a tuple of the movie description you are searching for and the filter to use. More info about the filter can be found [here](https://milvus.io/docs/boolean.md). The search first prints out your description and filter expression. After that for each result we print the score, title, type, release year, rating and description of the result movies."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
-   "metadata": {},
+   "execution_count": 11,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "ohIMv8stDIv9",
+    "outputId": "67186ad6-3522-4f3f-c20d-c67693c05b5e"
+   },
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "Description: movie about a fluffly animal Expression: release_year < 2019 and rating like \"PG%\"\n",
       "Results:\n",
-      "\tRank: 1 Score: 0.30083978176116943 Title: The Lamb\n",
-      "\t\tType: Movie Release Year: 2017 Rating: PG\n",
-      "A big-dreaming donkey escapes his menial existence and befriends some free-spirited\n",
-      "animal pals in this imaginative retelling of the Nativity Story.\n",
-      "\n",
-      "\tRank: 2 Score: 0.33528298139572144 Title: Puss in Boots\n",
+      "\tRank: 1 Score: 0.42213767766952515 Title: The Adventures of Tintin\n",
       "\t\tType: Movie Release Year: 2011 Rating: PG\n",
-      "The fabled feline heads to the Land of Giants with friends Humpty Dumpty and Kitty\n",
-      "Softpaws on a quest to nab its greatest treasure: the Golden Goose.\n",
+      "This 3-D motion capture adapts Georges Remi's classic comic strip about the adventures\n",
+      "of fearless young journalist Tintin and his trusty dog, Snowy.\n",
       "\n",
-      "\tRank: 3 Score: 0.33528298139572144 Title: Puss in Boots\n",
-      "\t\tType: Movie Release Year: 2011 Rating: PG\n",
-      "The fabled feline heads to the Land of Giants with friends Humpty Dumpty and Kitty\n",
-      "Softpaws on a quest to nab its greatest treasure: the Golden Goose.\n",
+      "\tRank: 2 Score: 0.4041026830673218 Title: Hedgehogs\n",
+      "\t\tType: Movie Release Year: 2016 Rating: PG\n",
+      "When a hedgehog suffering from memory loss forgets his identity, he ends up on a big\n",
+      "city journey with a pigeon to save his habitat from a human threat.\n",
       "\n",
-      "\tRank: 4 Score: 0.3414868116378784 Title: Show Dogs\n",
-      "\t\tType: Movie Release Year: 2018 Rating: PG\n",
-      "A rough and tough police dog must go undercover with an FBI agent as a prim and proper\n",
-      "pet at a dog show to save a baby panda from an illegal sale.\n",
+      "\tRank: 3 Score: 0.3980264663696289 Title: Osmosis Jones\n",
+      "\t\tType: Movie Release Year: 2001 Rating: PG\n",
+      "Peter and Bobby Farrelly outdo themselves with this partially animated tale about an\n",
+      "out-of-shape 40-year-old man who's the host to various organisms.\n",
       "\n",
-      "\tRank: 5 Score: 0.3414868116378784 Title: Show Dogs\n",
-      "\t\tType: Movie Release Year: 2018 Rating: PG\n",
-      "A rough and tough police dog must go undercover with an FBI agent as a prim and proper\n",
-      "pet at a dog show to save a baby panda from an illegal sale.\n",
-      "\n"
+      "\tRank: 4 Score: 0.39479154348373413 Title: The Lamb\n",
+      "\t\tType: Movie Release Year: 2017 Rating: PG\n",
+      "A big-dreaming donkey escapes his menial existence and befriends some free-spirited\n",
+      "animal pals in this imaginative retelling of the Nativity Story.\n",
+      "\n",
+      "\tRank: 5 Score: 0.39370301365852356 Title: Open Season 2\n",
+      "\t\tType: Movie Release Year: 2008 Rating: PG\n",
+      "Elliot the buck and his forest-dwelling cohorts must rescue their dachshund pal from\n",
+      "some spoiled pets bent on returning him to domesticity.\n"
      ]
     }
    ],
    "source": [
     "import textwrap\n",
     "\n",
-    "def query(query, top_k = 5):\n",
+    "\n",
+    "def query(query, top_k=5):\n",
     "    text, expr = query\n",
-    "    res = collection.search(embed(text), anns_field='embedding', expr = expr, param=QUERY_PARAM, limit = top_k, output_fields=['title', 'type', 'release_year', 'rating', 'description'])\n",
-    "    for i, hit in enumerate(res):\n",
-    "        print('Description:', text, 'Expression:', expr)\n",
-    "        print('Results:')\n",
-    "        for ii, hits in enumerate(hit):\n",
-    "            print('\\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))\n",
-    "            print('\\t\\t' + 'Type:', hits.entity.get('type'), 'Release Year:', hits.entity.get('release_year'), 'Rating:', hits.entity.get('rating'))\n",
-    "            print(textwrap.fill(hits.entity.get('description'), 88))\n",
+    "\n",
+    "    res = client.search(\n",
+    "        collection_name=COLLECTION_NAME,\n",
+    "        data=emb_texts(text),\n",
+    "        filter=expr,\n",
+    "        limit=top_k,\n",
+    "        output_fields=[\"title\", \"type\", \"release_year\", \"rating\", \"description\"],\n",
+    "        search_params={\n",
+    "            \"metric_type\": \"IP\",\n",
+    "            \"params\": {},\n",
+    "        },\n",
+    "    )\n",
+    "\n",
+    "    print(\"Description:\", text, \"Expression:\", expr)\n",
+    "\n",
+    "    for hit_group in res:\n",
+    "        print(\"Results:\")\n",
+    "        for rank, hit in enumerate(hit_group, start=1):\n",
+    "            entity = hit[\"entity\"]\n",
+    "\n",
+    "            print(\n",
+    "                f\"\\tRank: {rank} Score: {hit['distance']:} Title: {entity.get('title', '')}\"\n",
+    "            )\n",
+    "            print(\n",
+    "                f\"\\t\\tType: {entity.get('type', '')} \"\n",
+    "                f\"Release Year: {entity.get('release_year', '')} \"\n",
+    "                f\"Rating: {entity.get('rating', '')}\"\n",
+    "            )\n",
+    "            description = entity.get(\"description\", \"\")\n",
+    "            print(textwrap.fill(description, width=88))\n",
     "            print()\n",
     "\n",
-    "my_query = ('movie about a fluffly animal', 'release_year < 2019 and rating like \\\"PG%\\\"')\n",
+    "\n",
+    "my_query = (\"movie about a fluffly animal\", 'release_year < 2019 and rating like \"PG%\"')\n",
     "\n",
     "query(my_query)"
    ]
@@ -421,7 +506,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "haystack",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -435,10 +520,1042 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.11.5"
+  },
+  "colab": {
+   "provenance": []
   },
-  "orig_nbformat": 4
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "7364d39079fa40c5b5e5abccd2d006b6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HBoxModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_4f8b5d0ad9f942c59b386d6b5f07242e",
+       "IPY_MODEL_7db1a3f59362417586d8739cb791207a",
+       "IPY_MODEL_24517a7598c84e3dbc2864e702fccb86"
+      ],
+      "layout": "IPY_MODEL_3b004c8bc2b147b48d2807262322de40"
+     }
+    },
+    "4f8b5d0ad9f942c59b386d6b5f07242e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d17551bb40fc4a2fbe2a1628e32cab11",
+      "placeholder": "​",
+      "style": "IPY_MODEL_aabf31d74d7b425798c68b83d1c025d8",
+      "value": "README.md: 100%"
+     }
+    },
+    "7db1a3f59362417586d8739cb791207a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "FloatProgressModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d8f8541809964b2a9b2003092b1e1dad",
+      "max": 2812,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_4ccca81c7b714e75929822ed57cab5d5",
+      "value": 2812
+     }
+    },
+    "24517a7598c84e3dbc2864e702fccb86": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_2d941555097e4d49a01b4e240d0ab18f",
+      "placeholder": "​",
+      "style": "IPY_MODEL_81c4aa500d6449b1ac6dcd5ea6365630",
+      "value": " 2.81k/2.81k [00:00&lt;00:00, 186kB/s]"
+     }
+    },
+    "3b004c8bc2b147b48d2807262322de40": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d17551bb40fc4a2fbe2a1628e32cab11": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "aabf31d74d7b425798c68b83d1c025d8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "d8f8541809964b2a9b2003092b1e1dad": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "4ccca81c7b714e75929822ed57cab5d5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "ProgressStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "2d941555097e4d49a01b4e240d0ab18f": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "81c4aa500d6449b1ac6dcd5ea6365630": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "074d1bb0263442c082a83e13cc77a264": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HBoxModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_9beb8aa11477425498859e544100b109",
+       "IPY_MODEL_1608f9db979346d9bb294e564b507fce",
+       "IPY_MODEL_971e5d9d61c44905b1f1704178925db1"
+      ],
+      "layout": "IPY_MODEL_7ade9ecc4c2c4bda8ca175dbffedbb12"
+     }
+    },
+    "9beb8aa11477425498859e544100b109": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_0c5a98cdee664622a50c92be5ef4631f",
+      "placeholder": "​",
+      "style": "IPY_MODEL_e7069429122b48208c760be1ee5bcb9f",
+      "value": "netflix_titles.csv: 100%"
+     }
+    },
+    "1608f9db979346d9bb294e564b507fce": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "FloatProgressModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_4e41b396f14941508899975184e07a71",
+      "max": 3399670,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_a8796c17d961465fa141933ff82cf7d8",
+      "value": 3399670
+     }
+    },
+    "971e5d9d61c44905b1f1704178925db1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7a231c8d89b844b295ccb656c17d9464",
+      "placeholder": "​",
+      "style": "IPY_MODEL_6ea032bb9ca148b7885bc4cba9c17525",
+      "value": " 3.40M/3.40M [00:00&lt;00:00, 25.7MB/s]"
+     }
+    },
+    "7ade9ecc4c2c4bda8ca175dbffedbb12": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "0c5a98cdee664622a50c92be5ef4631f": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "e7069429122b48208c760be1ee5bcb9f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "4e41b396f14941508899975184e07a71": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a8796c17d961465fa141933ff82cf7d8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "ProgressStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "7a231c8d89b844b295ccb656c17d9464": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "6ea032bb9ca148b7885bc4cba9c17525": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "0d2ccc863fad437f8138fa0b32938f84": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HBoxModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_c8cef285e18f4b558ae6a6a4ad5aee92",
+       "IPY_MODEL_2cc3a2b72f154656b8e2c3acc20146b5",
+       "IPY_MODEL_620d6ed5d68f48b281b5f2200dfc7d0b"
+      ],
+      "layout": "IPY_MODEL_537bb70db0814a43aff05c4b2f712322"
+     }
+    },
+    "c8cef285e18f4b558ae6a6a4ad5aee92": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_5860b6c65c1b4f5aa486e7815e2aecfe",
+      "placeholder": "​",
+      "style": "IPY_MODEL_f326231ed25749f1b81fb0b6fca578f5",
+      "value": "Generating train split: 100%"
+     }
+    },
+    "2cc3a2b72f154656b8e2c3acc20146b5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "FloatProgressModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_1d992ac5fb2b43f4a9f791730bdea123",
+      "max": 8807,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_de31aa5b5e4845be980f550a4b2bea91",
+      "value": 8807
+     }
+    },
+    "620d6ed5d68f48b281b5f2200dfc7d0b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "HTMLModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_eb0436e824f94281af20d69dfcec3ea4",
+      "placeholder": "​",
+      "style": "IPY_MODEL_45a3e27ec99f4685913f2f5f77b846ef",
+      "value": " 8807/8807 [00:00&lt;00:00, 47917.27 examples/s]"
+     }
+    },
+    "537bb70db0814a43aff05c4b2f712322": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "5860b6c65c1b4f5aa486e7815e2aecfe": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "f326231ed25749f1b81fb0b6fca578f5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "1d992ac5fb2b43f4a9f791730bdea123": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "de31aa5b5e4845be980f550a4b2bea91": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "ProgressStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "eb0436e824f94281af20d69dfcec3ea4": {
+     "model_module": "@jupyter-widgets/base",
+     "model_name": "LayoutModel",
+     "model_module_version": "1.2.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "45a3e27ec99f4685913f2f5f77b846ef": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_name": "DescriptionStyleModel",
+     "model_module_version": "1.5.0",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    }
+   }
+  }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 0
 }
diff --git a/examples/vector_databases/milvus/Getting_started_with_Milvus_and_OpenAI.ipynb b/examples/vector_databases/milvus/Getting_started_with_Milvus_and_OpenAI.ipynb
index 9ea14250b3..262a3a67d8 100644
--- a/examples/vector_databases/milvus/Getting_started_with_Milvus_and_OpenAI.ipynb
+++ b/examples/vector_databases/milvus/Getting_started_with_Milvus_and_OpenAI.ipynb
@@ -1,15 +1,21 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "aXN5dTgZudz4"
+   },
    "source": [
     "# Getting Started with Milvus and OpenAI\n",
     "### Finding your next book\n",
+    "[Milvus](https://milvus.io/) is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search.\n",
+    "\n",
     "\n",
     "In this notebook we will be going over generating embeddings of book descriptions with OpenAI and using those embeddings within Milvus to find relevant books. The dataset in this example is sourced from HuggingFace datasets, and contains a little over 1 million title-description pairs.\n",
     "\n",
+    "\n",
+    "For demonstration purposes, we are using a reduced dataset of 10,000 samples from the original HuggingFace dataset containing over a million records. This subset will allow us to effectively illustrate the embedding and retrieval process without the overhead of handling the full dataset size.\n",
+    "\n",
     "Lets begin by first downloading the required libraries for this notebook:\n",
     "- `openai` is used for communicating with the OpenAI embedding service\n",
     "- `pymilvus` is used for communicating with the Milvus server\n",
@@ -19,534 +25,498 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com\n",
-      "Requirement already satisfied: openai in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (0.27.2)\n",
-      "Requirement already satisfied: pymilvus in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.2.2)\n",
-      "Requirement already satisfied: datasets in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.10.1)\n",
-      "Requirement already satisfied: tqdm in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (4.64.1)\n",
-      "Requirement already satisfied: aiohttp in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (3.8.4)\n",
-      "Requirement already satisfied: requests>=2.20 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (2.28.2)\n",
-      "Requirement already satisfied: pandas>=1.2.4 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.5.3)\n",
-      "Requirement already satisfied: ujson<=5.4.0,>=2.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (5.1.0)\n",
-      "Requirement already satisfied: mmh3<=3.0.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (3.0.0)\n",
-      "Requirement already satisfied: grpcio<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)\n",
-      "Requirement already satisfied: grpcio-tools<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)\n",
-      "Requirement already satisfied: huggingface-hub<1.0.0,>=0.2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.12.1)\n",
-      "Requirement already satisfied: dill<0.3.7,>=0.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.3.6)\n",
-      "Requirement already satisfied: xxhash in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (3.2.0)\n",
-      "Requirement already satisfied: pyyaml>=5.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (5.4.1)\n",
-      "Requirement already satisfied: fsspec[http]>=2021.11.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (2023.1.0)\n",
-      "Requirement already satisfied: packaging in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (23.0)\n",
-      "Requirement already satisfied: numpy>=1.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (1.23.5)\n",
-      "Requirement already satisfied: multiprocess in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.70.14)\n",
-      "Requirement already satisfied: pyarrow>=6.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (10.0.1)\n",
-      "Requirement already satisfied: responses<0.19 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.18.0)\n",
-      "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n",
-      "Requirement already satisfied: frozenlist>=1.1.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)\n",
-      "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)\n",
-      "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)\n",
-      "Requirement already satisfied: aiosignal>=1.1.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)\n",
-      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (3.0.1)\n",
-      "Requirement already satisfied: attrs>=17.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)\n",
-      "Requirement already satisfied: six>=1.5.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio<=1.48.0,>=1.47.0->pymilvus) (1.16.0)\n",
-      "Requirement already satisfied: protobuf<4.0dev,>=3.12.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (3.20.1)\n",
-      "Requirement already satisfied: setuptools in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (65.6.3)\n",
-      "Requirement already satisfied: filelock in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (3.9.0)\n",
-      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (4.5.0)\n",
-      "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2.8.2)\n",
-      "Requirement already satisfied: pytz>=2020.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2022.7.1)\n",
-      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.14)\n",
-      "Requirement already satisfied: idna<4,>=2.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n",
-      "Requirement already satisfied: certifi>=2017.4.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)\n"
-     ]
-    }
-   ],
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "collapsed": true,
+    "id": "XKZ4_Dexudz5",
+    "outputId": "e333ffd7-a039-4630-9f2f-2e656d2c3273"
+   },
+   "outputs": [],
    "source": [
     "! pip install openai pymilvus datasets tqdm"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "With the required packages installed we can get started. Lets begin by launching the Milvus service. The file being run is the `docker-compose.yaml` found in the folder of this file. This command launches a Milvus standalone instance which we will use for this test.  "
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ],
+   "metadata": {
+    "id": "jxtTMqr8yJ7y"
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "08DQFod0DIv7"
+   },
+   "source": [
+    "We will use OpenAI as the LLM in this example. You should prepare the [api key](https://platform.openai.com/docs/quickstart) `OPENAI_API_KEY` as an environment variable."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[1A\u001b[1B\u001b[0G\u001b[?25l[+] Running 0/0\n",
-      "\u001b[37m ⠋ Network milvus  Creating                                                0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 1/1\u001b[0m\n",
-      "\u001b[34m ⠿ Network milvus          Created                                         0.1s\n",
-      "\u001b[0m\u001b[37m ⠋ Container milvus-minio  Creating                                        0.1s\n",
-      "\u001b[0m\u001b[37m ⠋ Container milvus-etcd   Creating                                        0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
-      "\u001b[34m ⠿ Network milvus          Created                                         0.1s\n",
-      "\u001b[0m\u001b[37m ⠙ Container milvus-minio  Creating                                        0.2s\n",
-      "\u001b[0m\u001b[37m ⠙ Container milvus-etcd   Creating                                        0.2s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 1/3\n",
-      "\u001b[34m ⠿ Network milvus          Created                                         0.1s\n",
-      "\u001b[0m\u001b[37m ⠹ Container milvus-minio  Creating                                        0.3s\n",
-      "\u001b[0m\u001b[37m ⠹ Container milvus-etcd   Creating                                        0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 3/3\u001b[0m\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Created                                    0.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Created                                    0.3s\n",
-      "\u001b[0m\u001b[37m ⠋ Container milvus-standalone  Creating                                   0.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Created                                    0.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Created                                    0.3s\n",
-      "\u001b[0m\u001b[37m ⠙ Container milvus-standalone  Creating                                   0.2s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 4/4\u001b[0m\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Created                                    0.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Created                                    0.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.7s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.8s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   0.9s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   0.9s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.0s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.0s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.2s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.2s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.3s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.3s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.4s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.4s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.5s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.5s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.6s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.6s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 2/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-etcd        Starting                                   1.7s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-minio       Starting                                   1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Created                                    0.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.6s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.7s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.8s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   1.9s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.0s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.1s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.2s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.3s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.4s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.5s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 3/4\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[37m ⠿ Container milvus-standalone  Starting                                   2.6s\n",
-      "\u001b[0m\u001b[?25h\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 4/4\u001b[0m\n",
-      "\u001b[34m ⠿ Network milvus               Created                                    0.1s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-minio       Started                                    1.8s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-etcd        Started                                    1.7s\n",
-      "\u001b[0m\u001b[34m ⠿ Container milvus-standalone  Started                                    2.6s\n",
-      "\u001b[0m\u001b[?25h"
-     ]
+   "execution_count": 3,
+   "metadata": {
+    "id": "E8LsgeBzDIv8",
+    "ExecuteTime": {
+     "end_time": "2024-11-01T13:43:20.300284Z",
+     "start_time": "2024-11-01T13:43:20.293867Z"
     }
-   ],
+   },
+   "outputs": [],
    "source": [
-    "! docker compose up -d"
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "aJcCzQhODIv8"
+   },
    "source": [
-    "With Milvus running we can setup our global variables:\n",
-    "- HOST: The Milvus host address\n",
-    "- PORT: The Milvus port number\n",
-    "- COLLECTION_NAME: What to name the collection within Milvus\n",
-    "- DIMENSION: The dimension of the embeddings\n",
-    "- OPENAI_ENGINE: Which embedding model to use\n",
-    "- openai.api_key: Your OpenAI account key\n",
-    "- INDEX_PARAM: The index settings to use for the collection\n",
-    "- QUERY_PARAM: The search parameters to use\n",
-    "- BATCH_SIZE: How many texts to embed and insert at once"
+    "## Initialize OpenAI client and Milvus\n",
+    "Initialize the OpenAI client."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
+   "execution_count": 4,
+   "metadata": {
+    "id": "aBOi2mvKDIv8",
+    "ExecuteTime": {
+     "end_time": "2024-11-01T13:43:22.003864Z",
+     "start_time": "2024-11-01T13:43:21.686744Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "import openai\n",
+    "from openai import OpenAI\n",
     "\n",
-    "HOST = 'localhost'\n",
-    "PORT = 19530\n",
-    "COLLECTION_NAME = 'book_search'\n",
+    "openai_client = OpenAI()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "9IO_ts7pDIv8"
+   },
+   "source": [
+    "Set the collection name and dimension for the embeddings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "id": "4rXlQ29MDIv8",
+    "ExecuteTime": {
+     "end_time": "2024-11-01T13:43:23.240858Z",
+     "start_time": "2024-11-01T13:43:23.218542Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "COLLECTION_NAME = \"book_search\"\n",
     "DIMENSION = 1536\n",
-    "OPENAI_ENGINE = 'text-embedding-3-small'\n",
-    "openai.api_key = 'sk-your_key'\n",
-    "\n",
-    "INDEX_PARAM = {\n",
-    "    'metric_type':'L2',\n",
-    "    'index_type':\"HNSW\",\n",
-    "    'params':{'M': 8, 'efConstruction': 64}\n",
-    "}\n",
-    "\n",
-    "QUERY_PARAM = {\n",
-    "    \"metric_type\": \"L2\",\n",
-    "    \"params\": {\"ef\": 64},\n",
-    "}\n",
     "\n",
     "BATCH_SIZE = 1000"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "OHuz5DtVDIv8"
+   },
    "source": [
-    "## Milvus\n",
-    "This segment deals with Milvus and setting up the database for this use case. Within Milvus we need to setup a collection and index the collection. "
+    "Connect to Milvus."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
+   "execution_count": 6,
+   "metadata": {
+    "id": "VUXhaRtQDIv8",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "5c20f4d3-c480-46d9-f9e2-a1879c24872a",
+    "ExecuteTime": {
+     "end_time": "2024-11-01T13:43:26.564980Z",
+     "start_time": "2024-11-01T13:43:24.240975Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType\n",
+    "from pymilvus import MilvusClient\n",
     "\n",
     "# Connect to Milvus Database\n",
-    "connections.connect(host=HOST, port=PORT)"
+    "client = MilvusClient(\"./milvus_demo.db\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "U8es09T2DIv8"
+   },
+   "source": [
+    "> As for the argument of `url` and `token`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, say more than a million vectors, you can set up a more performant Milvus server on [Docker or Kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server address and port as your uri, e.g.`http://localhost:19530`. If you enable the authentication feature on Milvus, use \"<your_username>:<your_password>\" as the token, otherwise don't set the token.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-10-28T02:52:32.123914Z",
+     "start_time": "2024-10-28T02:52:32.116018Z"
+    },
+    "id": "eaBFyWQPDIv8"
+   },
    "outputs": [],
    "source": [
     "# Remove collection if it already exists\n",
-    "if utility.has_collection(COLLECTION_NAME):\n",
-    "    utility.drop_collection(COLLECTION_NAME)"
+    "if client.has_collection(COLLECTION_NAME):\n",
+    "    client.drop_collection(COLLECTION_NAME)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "rmt5EIn8DIv8"
+   },
+   "source": [
+    "Define the fields for the collection, which include the id, title, description, and embedding."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
+   "execution_count": null,
+   "metadata": {
+    "id": "xf5YMsDmDIv8",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "d3b327c8-c0ad-44d9-a57a-96d9be765a7f"
+   },
    "outputs": [],
    "source": [
-    "# Create collection which includes the id, title, and embedding.\n",
-    "fields = [\n",
-    "    FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),\n",
-    "    FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),\n",
-    "    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)\n",
-    "]\n",
-    "schema = CollectionSchema(fields=fields)\n",
-    "collection = Collection(name=COLLECTION_NAME, schema=schema)"
+    "from pymilvus import DataType\n",
+    "\n",
+    "# Create collection which includes the id, title, description, and embedding.\n",
+    "\n",
+    "# 1. Create schema\n",
+    "schema = MilvusClient.create_schema(\n",
+    "    auto_id=True,\n",
+    "    enable_dynamic_field=False,\n",
+    ")\n",
+    "\n",
+    "# 2. Add fields to schema\n",
+    "schema.add_field(field_name=\"id\", datatype=DataType.INT64, is_primary=True)\n",
+    "schema.add_field(field_name=\"title\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"description\", datatype=DataType.VARCHAR, max_length=64000)\n",
+    "schema.add_field(field_name=\"embedding\", datatype=DataType.FLOAT_VECTOR, dim=DIMENSION)\n",
+    "\n",
+    "# 3. Create collection with the schema\n",
+    "client.create_collection(collection_name=COLLECTION_NAME, schema=schema)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5Ch-2J5CDIv8"
+   },
+   "source": [
+    "Create the index on the collection and load it."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
+   "execution_count": null,
+   "metadata": {
+    "id": "K9fwi332DIv8",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "9a3cc8f6-e9c0-427f-b820-b545f5b29137"
+   },
    "outputs": [],
    "source": [
     "# Create the index on the collection and load it.\n",
-    "collection.create_index(field_name=\"embedding\", index_params=INDEX_PARAM)\n",
-    "collection.load()"
+    "\n",
+    "# 1. Prepare index parameters\n",
+    "index_params = client.prepare_index_params()\n",
+    "\n",
+    "\n",
+    "# 2. Add an index on the embedding field\n",
+    "index_params.add_index(\n",
+    "    field_name=\"embedding\",\n",
+    "    metric_type=\"L2\",\n",
+    "    index_type=\"HNSW\",\n",
+    "    params={\"M\": 8, \"efConstruction\": 64},\n",
+    ")\n",
+    "\n",
+    "\n",
+    "# 3. Create index\n",
+    "client.create_index(collection_name=COLLECTION_NAME, index_params=index_params)\n",
+    "\n",
+    "\n",
+    "# 4. Load collection\n",
+    "client.load_collection(collection_name=COLLECTION_NAME, replica_number=1)"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "uAlAcvbIudz7"
+   },
    "source": [
     "## Dataset\n",
-    "With Milvus up and running we can begin grabbing our data. Hugging Face Datasets is a hub that holds many different user datasets, and for this example we are using Skelebor's book dataset. This dataset contains title-description pairs for over 1 million books. We are going to embed each description and store it within Milvus along with its title. "
+    "With Milvus up and running we can begin grabbing our data. Hugging Face Datasets is a hub that holds many different user datasets, and for this example we are using Skelebor's book dataset. This dataset contains title-description pairs for over 1 million books. To keep the demonstration efficient, we will be working with a smaller subset of this dataset. We are going to embed each description and store it within Milvus along with its title."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-      "  from .autonotebook import tqdm as notebook_tqdm\n",
-      "Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parquet/Skelebor--book_titles_and_descriptions_en_clean-3596935b1d8a7747/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
-     ]
-    }
-   ],
+   "execution_count": null,
+   "metadata": {
+    "id": "Kku-oYcOudz7",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "c753d490-7fcc-4d09-ca46-d4deb80057cd"
+   },
+   "outputs": [],
    "source": [
     "import datasets\n",
     "\n",
-    "# Download the dataset and only use the `train` portion (file is around 800Mb)\n",
-    "dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')"
+    "# Download the dataset and only use the `train` portion\n",
+    "dataset = datasets.load_dataset(\n",
+    "    \"Skelebor/book_titles_and_descriptions_en_clean\", split=\"train\"\n",
+    ")\n",
+    "\n",
+    "# Shuffle and select a subset of 10,000 entries\n",
+    "dataset = dataset.shuffle(seed=42).select(range(10000))"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "00Hjlzgzudz7"
+   },
    "source": [
     "## Insert the Data\n",
-    "Now that we have our data on our machine we can begin embedding it and inserting it into Milvus. The embedding function takes in text and returns the embeddings in a list format. "
+    "Now that we have our data on our machine we can begin embedding it and inserting it into Milvus. The embedding function takes in text and returns the embeddings in a list format."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
+   "execution_count": 9,
+   "metadata": {
+    "id": "kioMXVIrudz7"
+   },
    "outputs": [],
    "source": [
     "# Simple function that converts the texts to embeddings\n",
-    "def embed(texts):\n",
-    "    embeddings = openai.Embedding.create(\n",
-    "        input=texts,\n",
-    "        engine=OPENAI_ENGINE\n",
-    "    )\n",
-    "    return [x['embedding'] for x in embeddings['data']]\n"
+    "def emb_texts(texts):\n",
+    "    res = openai_client.embeddings.create(input=texts, model=\"text-embedding-3-small\")\n",
+    "    return [res_data.embedding for res_data in res.data]"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "1JYeTdp7udz7"
+   },
    "source": [
-    "This next step does the actual inserting. Due to having so many datapoints, if you want to immidiately test it out you can stop the inserting cell block early and move along. Doing this will probably decrease the accuracy of the results due to less datapoints, but it should still be good enough. "
+    "This next step does the actual inserting. We iterate through all the entries and create batches that we insert once we hit our set batch size. After the loop is over we insert the last remaning batch if it exists."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
+   "execution_count": 10,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "c2-C0uDQudz7",
+    "outputId": "60ca7079-2ea6-4a9e-b86b-ef88d638e394"
+   },
    "outputs": [
     {
-     "name": "stderr",
      "output_type": "stream",
+     "name": "stderr",
      "text": [
-      "  0%|          | 1999/1032335 [00:06<57:22, 299.31it/s]  \n"
-     ]
-    },
-    {
-     "ename": "KeyboardInterrupt",
-     "evalue": "",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[18], line 13\u001b[0m\n\u001b[1;32m     11\u001b[0m data[\u001b[39m1\u001b[39m]\u001b[39m.\u001b[39mappend(dataset[i][\u001b[39m'\u001b[39m\u001b[39mdescription\u001b[39m\u001b[39m'\u001b[39m])\n\u001b[1;32m     12\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(data[\u001b[39m0\u001b[39m]) \u001b[39m%\u001b[39m BATCH_SIZE \u001b[39m==\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m---> 13\u001b[0m     data\u001b[39m.\u001b[39mappend(embed(data[\u001b[39m1\u001b[39;49m]))\n\u001b[1;32m     14\u001b[0m     collection\u001b[39m.\u001b[39minsert(data)\n\u001b[1;32m     15\u001b[0m     data \u001b[39m=\u001b[39m [[],[]]\n",
-      "Cell \u001b[0;32mIn[17], line 3\u001b[0m, in \u001b[0;36membed\u001b[0;34m(texts)\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39membed\u001b[39m(texts):\n\u001b[0;32m----> 3\u001b[0m     embeddings \u001b[39m=\u001b[39m openai\u001b[39m.\u001b[39;49mEmbedding\u001b[39m.\u001b[39;49mcreate(\n\u001b[1;32m      4\u001b[0m         \u001b[39minput\u001b[39;49m\u001b[39m=\u001b[39;49mtexts,\n\u001b[1;32m      5\u001b[0m         engine\u001b[39m=\u001b[39;49mOPENAI_ENGINE\n\u001b[1;32m      6\u001b[0m     )\n\u001b[1;32m      7\u001b[0m     \u001b[39mreturn\u001b[39;00m [x[\u001b[39m'\u001b[39m\u001b[39membedding\u001b[39m\u001b[39m'\u001b[39m] \u001b[39mfor\u001b[39;00m x \u001b[39min\u001b[39;00m embeddings[\u001b[39m'\u001b[39m\u001b[39mdata\u001b[39m\u001b[39m'\u001b[39m]]\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/embedding.py:33\u001b[0m, in \u001b[0;36mEmbedding.create\u001b[0;34m(cls, *args, **kwargs)\u001b[0m\n\u001b[1;32m     31\u001b[0m \u001b[39mwhile\u001b[39;00m \u001b[39mTrue\u001b[39;00m:\n\u001b[1;32m     32\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 33\u001b[0m         response \u001b[39m=\u001b[39m \u001b[39msuper\u001b[39;49m()\u001b[39m.\u001b[39;49mcreate(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m     35\u001b[0m         \u001b[39m# If a user specifies base64, we'll just return the encoded string.\u001b[39;00m\n\u001b[1;32m     36\u001b[0m         \u001b[39m# This is only for the default case.\u001b[39;00m\n\u001b[1;32m     37\u001b[0m         \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m user_provided_encoding_format:\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py:153\u001b[0m, in \u001b[0;36mEngineAPIResource.create\u001b[0;34m(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)\u001b[0m\n\u001b[1;32m    127\u001b[0m \u001b[39m@classmethod\u001b[39m\n\u001b[1;32m    128\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mcreate\u001b[39m(\n\u001b[1;32m    129\u001b[0m     \u001b[39mcls\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    136\u001b[0m     \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mparams,\n\u001b[1;32m    137\u001b[0m ):\n\u001b[1;32m    138\u001b[0m     (\n\u001b[1;32m    139\u001b[0m         deployment_id,\n\u001b[1;32m    140\u001b[0m         engine,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    150\u001b[0m         api_key, api_base, api_type, api_version, organization, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mparams\n\u001b[1;32m    151\u001b[0m     )\n\u001b[0;32m--> 153\u001b[0m     response, _, api_key \u001b[39m=\u001b[39m requestor\u001b[39m.\u001b[39;49mrequest(\n\u001b[1;32m    154\u001b[0m         \u001b[39m\"\u001b[39;49m\u001b[39mpost\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m    155\u001b[0m         url,\n\u001b[1;32m    156\u001b[0m         params\u001b[39m=\u001b[39;49mparams,\n\u001b[1;32m    157\u001b[0m         headers\u001b[39m=\u001b[39;49mheaders,\n\u001b[1;32m    158\u001b[0m         stream\u001b[39m=\u001b[39;49mstream,\n\u001b[1;32m    159\u001b[0m         request_id\u001b[39m=\u001b[39;49mrequest_id,\n\u001b[1;32m    160\u001b[0m         request_timeout\u001b[39m=\u001b[39;49mrequest_timeout,\n\u001b[1;32m    161\u001b[0m     )\n\u001b[1;32m    163\u001b[0m     \u001b[39mif\u001b[39;00m stream:\n\u001b[1;32m    164\u001b[0m         \u001b[39m# must be an iterator\u001b[39;00m\n\u001b[1;32m    165\u001b[0m         \u001b[39massert\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(response, OpenAIResponse)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:216\u001b[0m, in \u001b[0;36mAPIRequestor.request\u001b[0;34m(self, method, url, params, headers, files, stream, request_id, request_timeout)\u001b[0m\n\u001b[1;32m    205\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mrequest\u001b[39m(\n\u001b[1;32m    206\u001b[0m     \u001b[39mself\u001b[39m,\n\u001b[1;32m    207\u001b[0m     method,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    214\u001b[0m     request_timeout: Optional[Union[\u001b[39mfloat\u001b[39m, Tuple[\u001b[39mfloat\u001b[39m, \u001b[39mfloat\u001b[39m]]] \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m,\n\u001b[1;32m    215\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Tuple[Union[OpenAIResponse, Iterator[OpenAIResponse]], \u001b[39mbool\u001b[39m, \u001b[39mstr\u001b[39m]:\n\u001b[0;32m--> 216\u001b[0m     result \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mrequest_raw(\n\u001b[1;32m    217\u001b[0m         method\u001b[39m.\u001b[39;49mlower(),\n\u001b[1;32m    218\u001b[0m         url,\n\u001b[1;32m    219\u001b[0m         params\u001b[39m=\u001b[39;49mparams,\n\u001b[1;32m    220\u001b[0m         supplied_headers\u001b[39m=\u001b[39;49mheaders,\n\u001b[1;32m    221\u001b[0m         files\u001b[39m=\u001b[39;49mfiles,\n\u001b[1;32m    222\u001b[0m         stream\u001b[39m=\u001b[39;49mstream,\n\u001b[1;32m    223\u001b[0m         request_id\u001b[39m=\u001b[39;49mrequest_id,\n\u001b[1;32m    224\u001b[0m         request_timeout\u001b[39m=\u001b[39;49mrequest_timeout,\n\u001b[1;32m    225\u001b[0m     )\n\u001b[1;32m    226\u001b[0m     resp, got_stream \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_interpret_response(result, stream)\n\u001b[1;32m    227\u001b[0m     \u001b[39mreturn\u001b[39;00m resp, got_stream, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mapi_key\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:516\u001b[0m, in \u001b[0;36mAPIRequestor.request_raw\u001b[0;34m(self, method, url, params, supplied_headers, files, stream, request_id, request_timeout)\u001b[0m\n\u001b[1;32m    514\u001b[0m     _thread_context\u001b[39m.\u001b[39msession \u001b[39m=\u001b[39m _make_session()\n\u001b[1;32m    515\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 516\u001b[0m     result \u001b[39m=\u001b[39m _thread_context\u001b[39m.\u001b[39;49msession\u001b[39m.\u001b[39;49mrequest(\n\u001b[1;32m    517\u001b[0m         method,\n\u001b[1;32m    518\u001b[0m         abs_url,\n\u001b[1;32m    519\u001b[0m         headers\u001b[39m=\u001b[39;49mheaders,\n\u001b[1;32m    520\u001b[0m         data\u001b[39m=\u001b[39;49mdata,\n\u001b[1;32m    521\u001b[0m         files\u001b[39m=\u001b[39;49mfiles,\n\u001b[1;32m    522\u001b[0m         stream\u001b[39m=\u001b[39;49mstream,\n\u001b[1;32m    523\u001b[0m         timeout\u001b[39m=\u001b[39;49mrequest_timeout \u001b[39mif\u001b[39;49;00m request_timeout \u001b[39melse\u001b[39;49;00m TIMEOUT_SECS,\n\u001b[1;32m    524\u001b[0m     )\n\u001b[1;32m    525\u001b[0m \u001b[39mexcept\u001b[39;00m requests\u001b[39m.\u001b[39mexceptions\u001b[39m.\u001b[39mTimeout \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    526\u001b[0m     \u001b[39mraise\u001b[39;00m error\u001b[39m.\u001b[39mTimeout(\u001b[39m\"\u001b[39m\u001b[39mRequest timed out: \u001b[39m\u001b[39m{}\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mformat(e)) \u001b[39mfrom\u001b[39;00m \u001b[39me\u001b[39;00m\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:587\u001b[0m, in \u001b[0;36mSession.request\u001b[0;34m(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)\u001b[0m\n\u001b[1;32m    582\u001b[0m send_kwargs \u001b[39m=\u001b[39m {\n\u001b[1;32m    583\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39mtimeout\u001b[39m\u001b[39m\"\u001b[39m: timeout,\n\u001b[1;32m    584\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39mallow_redirects\u001b[39m\u001b[39m\"\u001b[39m: allow_redirects,\n\u001b[1;32m    585\u001b[0m }\n\u001b[1;32m    586\u001b[0m send_kwargs\u001b[39m.\u001b[39mupdate(settings)\n\u001b[0;32m--> 587\u001b[0m resp \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49msend(prep, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49msend_kwargs)\n\u001b[1;32m    589\u001b[0m \u001b[39mreturn\u001b[39;00m resp\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:701\u001b[0m, in \u001b[0;36mSession.send\u001b[0;34m(self, request, **kwargs)\u001b[0m\n\u001b[1;32m    698\u001b[0m start \u001b[39m=\u001b[39m preferred_clock()\n\u001b[1;32m    700\u001b[0m \u001b[39m# Send the request\u001b[39;00m\n\u001b[0;32m--> 701\u001b[0m r \u001b[39m=\u001b[39m adapter\u001b[39m.\u001b[39;49msend(request, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    703\u001b[0m \u001b[39m# Total elapsed time of the request (approximately)\u001b[39;00m\n\u001b[1;32m    704\u001b[0m elapsed \u001b[39m=\u001b[39m preferred_clock() \u001b[39m-\u001b[39m start\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/adapters.py:489\u001b[0m, in \u001b[0;36mHTTPAdapter.send\u001b[0;34m(self, request, stream, timeout, verify, cert, proxies)\u001b[0m\n\u001b[1;32m    487\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m    488\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m chunked:\n\u001b[0;32m--> 489\u001b[0m         resp \u001b[39m=\u001b[39m conn\u001b[39m.\u001b[39;49murlopen(\n\u001b[1;32m    490\u001b[0m             method\u001b[39m=\u001b[39;49mrequest\u001b[39m.\u001b[39;49mmethod,\n\u001b[1;32m    491\u001b[0m             url\u001b[39m=\u001b[39;49murl,\n\u001b[1;32m    492\u001b[0m             body\u001b[39m=\u001b[39;49mrequest\u001b[39m.\u001b[39;49mbody,\n\u001b[1;32m    493\u001b[0m             headers\u001b[39m=\u001b[39;49mrequest\u001b[39m.\u001b[39;49mheaders,\n\u001b[1;32m    494\u001b[0m             redirect\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m,\n\u001b[1;32m    495\u001b[0m             assert_same_host\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m,\n\u001b[1;32m    496\u001b[0m             preload_content\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m,\n\u001b[1;32m    497\u001b[0m             decode_content\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m,\n\u001b[1;32m    498\u001b[0m             retries\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mmax_retries,\n\u001b[1;32m    499\u001b[0m             timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m    500\u001b[0m         )\n\u001b[1;32m    502\u001b[0m     \u001b[39m# Send the request.\u001b[39;00m\n\u001b[1;32m    503\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    504\u001b[0m         \u001b[39mif\u001b[39;00m \u001b[39mhasattr\u001b[39m(conn, \u001b[39m\"\u001b[39m\u001b[39mproxy_pool\u001b[39m\u001b[39m\"\u001b[39m):\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:703\u001b[0m, in \u001b[0;36mHTTPConnectionPool.urlopen\u001b[0;34m(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)\u001b[0m\n\u001b[1;32m    700\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_prepare_proxy(conn)\n\u001b[1;32m    702\u001b[0m \u001b[39m# Make the request on the httplib connection object.\u001b[39;00m\n\u001b[0;32m--> 703\u001b[0m httplib_response \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_make_request(\n\u001b[1;32m    704\u001b[0m     conn,\n\u001b[1;32m    705\u001b[0m     method,\n\u001b[1;32m    706\u001b[0m     url,\n\u001b[1;32m    707\u001b[0m     timeout\u001b[39m=\u001b[39;49mtimeout_obj,\n\u001b[1;32m    708\u001b[0m     body\u001b[39m=\u001b[39;49mbody,\n\u001b[1;32m    709\u001b[0m     headers\u001b[39m=\u001b[39;49mheaders,\n\u001b[1;32m    710\u001b[0m     chunked\u001b[39m=\u001b[39;49mchunked,\n\u001b[1;32m    711\u001b[0m )\n\u001b[1;32m    713\u001b[0m \u001b[39m# If we're going to release the connection in ``finally:``, then\u001b[39;00m\n\u001b[1;32m    714\u001b[0m \u001b[39m# the response doesn't need to know about the connection. Otherwise\u001b[39;00m\n\u001b[1;32m    715\u001b[0m \u001b[39m# it will also try to release it and we'll have a double-release\u001b[39;00m\n\u001b[1;32m    716\u001b[0m \u001b[39m# mess.\u001b[39;00m\n\u001b[1;32m    717\u001b[0m response_conn \u001b[39m=\u001b[39m conn \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m release_conn \u001b[39melse\u001b[39;00m \u001b[39mNone\u001b[39;00m\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:449\u001b[0m, in \u001b[0;36mHTTPConnectionPool._make_request\u001b[0;34m(self, conn, method, url, timeout, chunked, **httplib_request_kw)\u001b[0m\n\u001b[1;32m    444\u001b[0m             httplib_response \u001b[39m=\u001b[39m conn\u001b[39m.\u001b[39mgetresponse()\n\u001b[1;32m    445\u001b[0m         \u001b[39mexcept\u001b[39;00m \u001b[39mBaseException\u001b[39;00m \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    446\u001b[0m             \u001b[39m# Remove the TypeError from the exception chain in\u001b[39;00m\n\u001b[1;32m    447\u001b[0m             \u001b[39m# Python 3 (including for exceptions like SystemExit).\u001b[39;00m\n\u001b[1;32m    448\u001b[0m             \u001b[39m# Otherwise it looks like a bug in the code.\u001b[39;00m\n\u001b[0;32m--> 449\u001b[0m             six\u001b[39m.\u001b[39;49mraise_from(e, \u001b[39mNone\u001b[39;49;00m)\n\u001b[1;32m    450\u001b[0m \u001b[39mexcept\u001b[39;00m (SocketTimeout, BaseSSLError, SocketError) \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    451\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_raise_timeout(err\u001b[39m=\u001b[39me, url\u001b[39m=\u001b[39murl, timeout_value\u001b[39m=\u001b[39mread_timeout)\n",
-      "File \u001b[0;32m<string>:3\u001b[0m, in \u001b[0;36mraise_from\u001b[0;34m(value, from_value)\u001b[0m\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:444\u001b[0m, in \u001b[0;36mHTTPConnectionPool._make_request\u001b[0;34m(self, conn, method, url, timeout, chunked, **httplib_request_kw)\u001b[0m\n\u001b[1;32m    441\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mTypeError\u001b[39;00m:\n\u001b[1;32m    442\u001b[0m     \u001b[39m# Python 3\u001b[39;00m\n\u001b[1;32m    443\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 444\u001b[0m         httplib_response \u001b[39m=\u001b[39m conn\u001b[39m.\u001b[39;49mgetresponse()\n\u001b[1;32m    445\u001b[0m     \u001b[39mexcept\u001b[39;00m \u001b[39mBaseException\u001b[39;00m \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    446\u001b[0m         \u001b[39m# Remove the TypeError from the exception chain in\u001b[39;00m\n\u001b[1;32m    447\u001b[0m         \u001b[39m# Python 3 (including for exceptions like SystemExit).\u001b[39;00m\n\u001b[1;32m    448\u001b[0m         \u001b[39m# Otherwise it looks like a bug in the code.\u001b[39;00m\n\u001b[1;32m    449\u001b[0m         six\u001b[39m.\u001b[39mraise_from(e, \u001b[39mNone\u001b[39;00m)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/http/client.py:1377\u001b[0m, in \u001b[0;36mHTTPConnection.getresponse\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1375\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m   1376\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1377\u001b[0m         response\u001b[39m.\u001b[39;49mbegin()\n\u001b[1;32m   1378\u001b[0m     \u001b[39mexcept\u001b[39;00m \u001b[39mConnectionError\u001b[39;00m:\n\u001b[1;32m   1379\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mclose()\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/http/client.py:320\u001b[0m, in \u001b[0;36mHTTPResponse.begin\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    318\u001b[0m \u001b[39m# read until we get a non-100 response\u001b[39;00m\n\u001b[1;32m    319\u001b[0m \u001b[39mwhile\u001b[39;00m \u001b[39mTrue\u001b[39;00m:\n\u001b[0;32m--> 320\u001b[0m     version, status, reason \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_read_status()\n\u001b[1;32m    321\u001b[0m     \u001b[39mif\u001b[39;00m status \u001b[39m!=\u001b[39m CONTINUE:\n\u001b[1;32m    322\u001b[0m         \u001b[39mbreak\u001b[39;00m\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/http/client.py:281\u001b[0m, in \u001b[0;36mHTTPResponse._read_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    280\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_read_status\u001b[39m(\u001b[39mself\u001b[39m):\n\u001b[0;32m--> 281\u001b[0m     line \u001b[39m=\u001b[39m \u001b[39mstr\u001b[39m(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mfp\u001b[39m.\u001b[39;49mreadline(_MAXLINE \u001b[39m+\u001b[39;49m \u001b[39m1\u001b[39;49m), \u001b[39m\"\u001b[39m\u001b[39miso-8859-1\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m    282\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(line) \u001b[39m>\u001b[39m _MAXLINE:\n\u001b[1;32m    283\u001b[0m         \u001b[39mraise\u001b[39;00m LineTooLong(\u001b[39m\"\u001b[39m\u001b[39mstatus line\u001b[39m\u001b[39m\"\u001b[39m)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/socket.py:704\u001b[0m, in \u001b[0;36mSocketIO.readinto\u001b[0;34m(self, b)\u001b[0m\n\u001b[1;32m    702\u001b[0m \u001b[39mwhile\u001b[39;00m \u001b[39mTrue\u001b[39;00m:\n\u001b[1;32m    703\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 704\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_sock\u001b[39m.\u001b[39;49mrecv_into(b)\n\u001b[1;32m    705\u001b[0m     \u001b[39mexcept\u001b[39;00m timeout:\n\u001b[1;32m    706\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_timeout_occurred \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1242\u001b[0m, in \u001b[0;36mSSLSocket.recv_into\u001b[0;34m(self, buffer, nbytes, flags)\u001b[0m\n\u001b[1;32m   1238\u001b[0m     \u001b[39mif\u001b[39;00m flags \u001b[39m!=\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[1;32m   1239\u001b[0m         \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m   1240\u001b[0m           \u001b[39m\"\u001b[39m\u001b[39mnon-zero flags not allowed in calls to recv_into() on \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m%\u001b[39m\n\u001b[1;32m   1241\u001b[0m           \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m\u001b[39m__class__\u001b[39m)\n\u001b[0;32m-> 1242\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mread(nbytes, buffer)\n\u001b[1;32m   1243\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   1244\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39msuper\u001b[39m()\u001b[39m.\u001b[39mrecv_into(buffer, nbytes, flags)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1100\u001b[0m, in \u001b[0;36mSSLSocket.read\u001b[0;34m(self, len, buffer)\u001b[0m\n\u001b[1;32m   1098\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m   1099\u001b[0m     \u001b[39mif\u001b[39;00m buffer \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m-> 1100\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_sslobj\u001b[39m.\u001b[39;49mread(\u001b[39mlen\u001b[39;49m, buffer)\n\u001b[1;32m   1101\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m   1102\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_sslobj\u001b[39m.\u001b[39mread(\u001b[39mlen\u001b[39m)\n",
-      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+      "100%|██████████| 10000/10000 [01:09<00:00, 144.77it/s]\n"
      ]
     }
    ],
    "source": [
     "from tqdm import tqdm\n",
     "\n",
-    "data = [\n",
-    "    [], # title\n",
-    "    [], # description\n",
-    "]\n",
+    "# batch (data to be inserted) is a list of dictionaries\n",
+    "batch = []\n",
     "\n",
     "# Embed and insert in batches\n",
     "for i in tqdm(range(0, len(dataset))):\n",
-    "    data[0].append(dataset[i]['title'])\n",
-    "    data[1].append(dataset[i]['description'])\n",
-    "    if len(data[0]) % BATCH_SIZE == 0:\n",
-    "        data.append(embed(data[1]))\n",
-    "        collection.insert(data)\n",
-    "        data = [[],[]]\n",
-    "\n",
-    "# Embed and insert the remainder \n",
-    "if len(data[0]) != 0:\n",
-    "    data.append(embed(data[1]))\n",
-    "    collection.insert(data)\n",
-    "    data = [[],[]]\n"
+    "    batch.append(\n",
+    "        {\n",
+    "            \"title\": dataset[i][\"title\"] or \"\",\n",
+    "            \"description\": dataset[i][\"description\"] or \"\",\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "    if len(batch) % BATCH_SIZE == 0 or i == len(dataset) - 1:\n",
+    "        embeddings = emb_texts([item[\"description\"] for item in batch])\n",
+    "\n",
+    "        for item, emb in zip(batch, embeddings):\n",
+    "            item[\"embedding\"] = emb\n",
+    "\n",
+    "        client.insert(collection_name=COLLECTION_NAME, data=batch)\n",
+    "        batch = []"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "id": "rqEozGWHudz7"
+   },
    "source": [
     "## Query the Database\n",
-    "With our data safely inserted in Milvus, we can now perform a query. The query takes in a string or a list of strings and searches them. The resuts print out your provided description and the results that include the result score, the result title, and the result book description. "
+    "With our data safely inserted in Milvus, we can now perform a query. The query takes in a string or a list of strings and searches them. The resuts print out your provided description and the results that include the result score, the result title, and the result book description.\n",
+    "\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import textwrap\n",
-    "\n",
-    "def query(queries, top_k = 5):\n",
-    "    if type(queries) != list:\n",
-    "        queries = [queries]\n",
-    "    res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])\n",
-    "    for i, hit in enumerate(res):\n",
-    "        print('Description:', queries[i])\n",
-    "        print('Results:')\n",
-    "        for ii, hits in enumerate(hit):\n",
-    "            print('\\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))\n",
-    "            print(textwrap.fill(hits.entity.get('description'), 88))\n",
-    "            print()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "metadata": {},
+   "execution_count": 15,
+   "metadata": {
+    "id": "VpILOh_Mudz7",
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "outputId": "cec5056d-6329-4e68-fbf6-ad50f85168e0"
+   },
    "outputs": [
     {
-     "name": "stderr",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
-      "RPC error: [search], <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>, <Time:{'RPC start': '2023-03-17 14:22:18.368461', 'RPC error': '2023-03-17 14:22:18.382086'}>\n"
-     ]
-    },
-    {
-     "ename": "MilvusException",
-     "evalue": "<MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mMilvusException\u001b[0m                           Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[32], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m query(\u001b[39m'\u001b[39;49m\u001b[39mBook about a k-9 from europe\u001b[39;49m\u001b[39m'\u001b[39;49m)\n",
-      "Cell \u001b[0;32mIn[31], line 6\u001b[0m, in \u001b[0;36mquery\u001b[0;34m(queries, top_k)\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mtype\u001b[39m(queries) \u001b[39m!=\u001b[39m \u001b[39mlist\u001b[39m:\n\u001b[1;32m      5\u001b[0m     queries \u001b[39m=\u001b[39m [queries]\n\u001b[0;32m----> 6\u001b[0m res \u001b[39m=\u001b[39m collection\u001b[39m.\u001b[39;49msearch(embed(queries), anns_field\u001b[39m=\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39membedding\u001b[39;49m\u001b[39m'\u001b[39;49m, param\u001b[39m=\u001b[39;49mQUERY_PARAM, limit \u001b[39m=\u001b[39;49m top_k, output_fields\u001b[39m=\u001b[39;49m[\u001b[39m'\u001b[39;49m\u001b[39mtitle\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mdescription\u001b[39;49m\u001b[39m'\u001b[39;49m])\n\u001b[1;32m      7\u001b[0m \u001b[39mfor\u001b[39;00m i, hit \u001b[39min\u001b[39;00m \u001b[39menumerate\u001b[39m(res):\n\u001b[1;32m      8\u001b[0m     \u001b[39mprint\u001b[39m(\u001b[39m'\u001b[39m\u001b[39mDescription:\u001b[39m\u001b[39m'\u001b[39m, queries[i])\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/orm/collection.py:614\u001b[0m, in \u001b[0;36mCollection.search\u001b[0;34m(self, data, anns_field, param, limit, expr, partition_names, output_fields, timeout, round_decimal, **kwargs)\u001b[0m\n\u001b[1;32m    611\u001b[0m     \u001b[39mraise\u001b[39;00m DataTypeNotMatchException(message\u001b[39m=\u001b[39mExceptionsMessage\u001b[39m.\u001b[39mExprType \u001b[39m%\u001b[39m \u001b[39mtype\u001b[39m(expr))\n\u001b[1;32m    613\u001b[0m conn \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_connection()\n\u001b[0;32m--> 614\u001b[0m res \u001b[39m=\u001b[39m conn\u001b[39m.\u001b[39;49msearch(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_name, data, anns_field, param, limit, expr,\n\u001b[1;32m    615\u001b[0m                   partition_names, output_fields, round_decimal, timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m    616\u001b[0m                   schema\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_schema_dict, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    617\u001b[0m \u001b[39mif\u001b[39;00m kwargs\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39m_async\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mFalse\u001b[39;00m):\n\u001b[1;32m    618\u001b[0m     \u001b[39mreturn\u001b[39;00m SearchFuture(res)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:109\u001b[0m, in \u001b[0;36merror_handler.<locals>.wrapper.<locals>.handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    107\u001b[0m     record_dict[\u001b[39m\"\u001b[39m\u001b[39mRPC error\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mstr\u001b[39m(datetime\u001b[39m.\u001b[39mdatetime\u001b[39m.\u001b[39mnow())\n\u001b[1;32m    108\u001b[0m     LOGGER\u001b[39m.\u001b[39merror(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mRPC error: [\u001b[39m\u001b[39m{\u001b[39;00minner_name\u001b[39m}\u001b[39;00m\u001b[39m], \u001b[39m\u001b[39m{\u001b[39;00me\u001b[39m}\u001b[39;00m\u001b[39m, <Time:\u001b[39m\u001b[39m{\u001b[39;00mrecord_dict\u001b[39m}\u001b[39;00m\u001b[39m>\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m--> 109\u001b[0m     \u001b[39mraise\u001b[39;00m e\n\u001b[1;32m    110\u001b[0m \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mFutureTimeoutError \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    111\u001b[0m     record_dict[\u001b[39m\"\u001b[39m\u001b[39mgRPC timeout\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mstr\u001b[39m(datetime\u001b[39m.\u001b[39mdatetime\u001b[39m.\u001b[39mnow())\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:105\u001b[0m, in \u001b[0;36merror_handler.<locals>.wrapper.<locals>.handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    103\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m    104\u001b[0m     record_dict[\u001b[39m\"\u001b[39m\u001b[39mRPC start\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mstr\u001b[39m(datetime\u001b[39m.\u001b[39mdatetime\u001b[39m.\u001b[39mnow())\n\u001b[0;32m--> 105\u001b[0m     \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    106\u001b[0m \u001b[39mexcept\u001b[39;00m MilvusException \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m    107\u001b[0m     record_dict[\u001b[39m\"\u001b[39m\u001b[39mRPC error\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mstr\u001b[39m(datetime\u001b[39m.\u001b[39mdatetime\u001b[39m.\u001b[39mnow())\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:136\u001b[0m, in \u001b[0;36mtracing_request.<locals>.wrapper.<locals>.handler\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    134\u001b[0m \u001b[39mif\u001b[39;00m req_id:\n\u001b[1;32m    135\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mset_onetime_request_id(req_id)\n\u001b[0;32m--> 136\u001b[0m ret \u001b[39m=\u001b[39m func(\u001b[39mself\u001b[39;49m, \u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    137\u001b[0m \u001b[39mreturn\u001b[39;00m ret\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:85\u001b[0m, in \u001b[0;36mretry_on_rpc_failure.<locals>.wrapper.<locals>.handler\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m     83\u001b[0m         back_off \u001b[39m=\u001b[39m \u001b[39mmin\u001b[39m(back_off \u001b[39m*\u001b[39m back_off_multiplier, max_back_off)\n\u001b[1;32m     84\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m---> 85\u001b[0m         \u001b[39mraise\u001b[39;00m e\n\u001b[1;32m     86\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m     87\u001b[0m     \u001b[39mraise\u001b[39;00m e\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:50\u001b[0m, in \u001b[0;36mretry_on_rpc_failure.<locals>.wrapper.<locals>.handler\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m     48\u001b[0m \u001b[39mwhile\u001b[39;00m \u001b[39mTrue\u001b[39;00m:\n\u001b[1;32m     49\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 50\u001b[0m         \u001b[39mreturn\u001b[39;00m func(\u001b[39mself\u001b[39;49m, \u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m     51\u001b[0m     \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mRpcError \u001b[39mas\u001b[39;00m e:\n\u001b[1;32m     52\u001b[0m         \u001b[39m# DEADLINE_EXCEEDED means that the task wat not completed\u001b[39;00m\n\u001b[1;32m     53\u001b[0m         \u001b[39m# UNAVAILABLE means that the service is not reachable currently\u001b[39;00m\n\u001b[1;32m     54\u001b[0m         \u001b[39m# Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code\u001b[39;00m\n\u001b[1;32m     55\u001b[0m         \u001b[39mif\u001b[39;00m e\u001b[39m.\u001b[39mcode() \u001b[39m!=\u001b[39m grpc\u001b[39m.\u001b[39mStatusCode\u001b[39m.\u001b[39mDEADLINE_EXCEEDED \u001b[39mand\u001b[39;00m e\u001b[39m.\u001b[39mcode() \u001b[39m!=\u001b[39m grpc\u001b[39m.\u001b[39mStatusCode\u001b[39m.\u001b[39mUNAVAILABLE:\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:472\u001b[0m, in \u001b[0;36mGrpcHandler.search\u001b[0;34m(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, schema, **kwargs)\u001b[0m\n\u001b[1;32m    467\u001b[0m requests \u001b[39m=\u001b[39m Prepare\u001b[39m.\u001b[39msearch_requests_with_expr(collection_name, data, anns_field, param, limit, schema,\n\u001b[1;32m    468\u001b[0m                                              expression, partition_names, output_fields, round_decimal,\n\u001b[1;32m    469\u001b[0m                                              \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    471\u001b[0m auto_id \u001b[39m=\u001b[39m schema[\u001b[39m\"\u001b[39m\u001b[39mauto_id\u001b[39m\u001b[39m\"\u001b[39m]\n\u001b[0;32m--> 472\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_execute_search_requests(requests, timeout, round_decimal\u001b[39m=\u001b[39;49mround_decimal, auto_id\u001b[39m=\u001b[39;49mauto_id, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:441\u001b[0m, in \u001b[0;36mGrpcHandler._execute_search_requests\u001b[0;34m(self, requests, timeout, **kwargs)\u001b[0m\n\u001b[1;32m    439\u001b[0m \u001b[39mif\u001b[39;00m kwargs\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39m_async\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mFalse\u001b[39;00m):\n\u001b[1;32m    440\u001b[0m     \u001b[39mreturn\u001b[39;00m SearchFuture(\u001b[39mNone\u001b[39;00m, \u001b[39mNone\u001b[39;00m, \u001b[39mTrue\u001b[39;00m, pre_err)\n\u001b[0;32m--> 441\u001b[0m \u001b[39mraise\u001b[39;00m pre_err\n",
-      "File \u001b[0;32m~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:432\u001b[0m, in \u001b[0;36mGrpcHandler._execute_search_requests\u001b[0;34m(self, requests, timeout, **kwargs)\u001b[0m\n\u001b[1;32m    429\u001b[0m     response \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_stub\u001b[39m.\u001b[39mSearch(request, timeout\u001b[39m=\u001b[39mtimeout)\n\u001b[1;32m    431\u001b[0m     \u001b[39mif\u001b[39;00m response\u001b[39m.\u001b[39mstatus\u001b[39m.\u001b[39merror_code \u001b[39m!=\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m--> 432\u001b[0m         \u001b[39mraise\u001b[39;00m MilvusException(response\u001b[39m.\u001b[39mstatus\u001b[39m.\u001b[39merror_code, response\u001b[39m.\u001b[39mstatus\u001b[39m.\u001b[39mreason)\n\u001b[1;32m    434\u001b[0m     raws\u001b[39m.\u001b[39mappend(response)\n\u001b[1;32m    435\u001b[0m round_decimal \u001b[39m=\u001b[39m kwargs\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mround_decimal\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m-\u001b[39m\u001b[39m1\u001b[39m)\n",
-      "\u001b[0;31mMilvusException\u001b[0m: <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>"
+      "Description: Book about a k-9 from europe\n",
+      "Results:\n",
+      "\tRank: 1 Score: 1.086188554763794 Title: The Purloined Poodle (The Iron Druid Chronicles, #8.5)\n",
+      "Thanks to his relationship with the ancient Druid Atticus O'Sullivan, Oberon the Irish\n",
+      "wolfhound knows trouble when he smells it--and furthermore, he knows he can handle it.\n",
+      "When he discovers that a prizewinning poodle has been abducted in Eugene, Oregon, he\n",
+      "learns that it's part of a rash of hound abductions all over the Pacific Northwest.\n",
+      "Since the police aren't too worried about dogs they assume have run away, Oberon knows\n",
+      "it's up to him to track down those hounds and reunite them with their humans. For\n",
+      "justice! And gravy! Engaging the services of his faithful Druid, Oberon must travel\n",
+      "throughout Oregon and Washington to question a man with a huge salami, thwart the plans\n",
+      "of diabolical squirrels, and avoid, at all costs, a fight with a great big bear. But if\n",
+      "he's going to solve the case of the Purloined Poodle, Oberon will have to recruit the\n",
+      "help of a Boston terrier named Starbuck, survive the vegetables in a hipster pot pie,\n",
+      "and firmly refuse to be distracted by fire hydrants and rabbits hiding in the rose\n",
+      "bushes. At the end of the day, will it be a sad bowl of dry kibble for the world's\n",
+      "finest hound detective, or will everything be coming up sirloins? The Purloined Poodle\n",
+      "is another exciting novella entry in Kevin Hearne's New York Times best-selling Iron\n",
+      "Druid series.\n",
+      "\n",
+      "\tRank: 2 Score: 1.1287219524383545 Title: Thereby Hangs a Tail (Chet and Bernie Series #2)\n",
+      "9 hours, 27 minutes In the irresistible second Chet and Bernie mystery, Chet gets a\n",
+      "glimpse of the show dog world turned deadly.What first seems like a walk in the park to\n",
+      "wise and lovable canine narrator Chet and his human companion Bernie--to investigate\n",
+      "threats made against a pretty, pampered show dog--turns into a serious case when\n",
+      "Princess and her owner are abducted. To make matters worse, Bernie's on-again, off-again\n",
+      "girlfriend, reporter Susie Sanchez, disappears too. When Chet is separated from Bernie,\n",
+      "he's on his own to put the pieces together, find his way home, and save the day. Spencer\n",
+      "Quinn's \"brilliantly original\" (Richmond Times-Dispatch) and \"masterful\" (Los Angeles\n",
+      "Times) new series combines genuine suspense and intrigue with humor and insight for a\n",
+      "tail-wagging good time readers won't soon forget.\n",
+      "\n",
+      "\tRank: 3 Score: 1.1367161273956299 Title: The Book of the Dog\n",
+      "Featuring all kinds of dogs big, small, graceful, cute, funny The Book of the Dog is a\n",
+      "cool and quirky collection of dog art and illustration by artists around the world.\n",
+      "Interspersed through the illustrations are short texts about the artists and different\n",
+      "breeds, paying homage to man's best friend. Beautifully designed and packaged, the book\n",
+      "will appeal to dog lovers of all ages.\n",
+      "\n",
+      "\tRank: 4 Score: 1.1368114948272705 Title: The Badness of King George\n",
+      "Judith Summers' life is about to change dramatically. Her five-year relationship with\n",
+      "her on-off boyfriend has finally ended. Her son, Joshua, is off to university, and for\n",
+      "the first time since her husband died she's living alone. Well, not entirely alone. She\n",
+      "still has George, her King Charles Spaniel. Judith knows she needs a new challenge. But\n",
+      "how free can she ever be with George in tow? He is, of course, immensely lovely. But\n",
+      "he's also spoilt, lazy, and prone to flouncing around the house like a fluffed-up diva.\n",
+      "But then, during a chance encounter , Judith finds out about Many Tears, a dog rescue\n",
+      "centre. Before she knows it, she has joined a nationwide network of canine foster\n",
+      "carers. Far from having Judith all to himself, George suddenly finds he has to share his\n",
+      "owner with lots of other less fortunate dogs. And he's finding adjusting to this new way\n",
+      "of life a bit of a challenge...\n",
+      "\n",
+      "\tRank: 5 Score: 1.1388386487960815 Title: DogTown: Tales of Rescue, Rehabilitation, and Redemption\n",
+      "From Marley and Meto Temple Grandin's groundbreaking books to Cesar Millan's television\n",
+      "show, America's many millions of pet owners eagerly seek new insights into animal\n",
+      "behavior, and one of the most popular sources of compelling stories and practical advice\n",
+      "is DogTown,the National Geographic Channel's latest hit show. A national rescue\n",
+      "organization with more than 200,000 members, DogTown is the area where dogs live at the\n",
+      "nation's largest companion animal sanctuary run by Best Friends Animal Society. This\n",
+      "informative, inspiring book presents representative stories of dogs considered\n",
+      "unadoptable by other shelters. They come from many backgrounds: some were abandoned;\n",
+      "some prowled the streets as strays; others suffer from mysterious illnesses, serious\n",
+      "injuries, or antisocial behaviors that discourage potential adopters. But good fortune\n",
+      "led them to Best Friends and the dedicated people devoted to helping them recover and\n",
+      "find welcoming homes. These compelling, winningly illustrated true stories, each\n",
+      "uniquely moving and inspirational, draw upon the experience of veterinarians, trainers,\n",
+      "and volunteers to probe a range of tough, touching cases that evoke both the joy and the\n",
+      "occasional but inevitable heartbreak that accompanies this work. Each chapter follows a\n",
+      "dog from the first day at Dogtown until he ultimately finds (or doesn't find) a\n",
+      "permanent new home, focusing both on the relationship between the dog and the Dogtown\n",
+      "staff and on the latest discoveries about animal health and behavior. We learn how dogs\n",
+      "process information, how trauma affects their behavior, and how people can help them\n",
+      "overcome their problems. In the end, we come to see that there are no \"bad dogs\" and\n",
+      "that with patience, care, and compassion, people can help dogs to heal.\n"
      ]
     }
    ],
    "source": [
-    "query('Book about a k-9 from europe')"
+    "import textwrap\n",
+    "\n",
+    "\n",
+    "def query(queries, top_k=5):\n",
+    "    res = client.search(\n",
+    "        collection_name=COLLECTION_NAME,\n",
+    "        data=emb_texts(queries),\n",
+    "        limit=top_k,\n",
+    "        output_fields=[\"title\", \"description\"],\n",
+    "        search_params={\n",
+    "            \"metric_type\": \"L2\",\n",
+    "            \"params\": {\"ef\": 64},\n",
+    "        },\n",
+    "    )\n",
+    "    print(\"Description:\", queries)\n",
+    "\n",
+    "    for hit_group in res:\n",
+    "        print(\"Results:\")\n",
+    "        for rank, hit in enumerate(hit_group, start=1):\n",
+    "            entity = hit[\"entity\"]\n",
+    "\n",
+    "            print(\n",
+    "                f\"\\tRank: {rank} Score: {hit['distance']:} Title: {entity.get('title', '')}\"\n",
+    "            )\n",
+    "            description = entity.get(\"description\", \"\")\n",
+    "            print(textwrap.fill(description, width=88))\n",
+    "            print()\n",
+    "\n",
+    "\n",
+    "query(\"Book about a k-9 from europe\")"
    ]
   }
  ],
@@ -568,8 +538,11 @@
    "pygments_lexer": "ipython3",
    "version": "3.9.16"
   },
-  "orig_nbformat": 4
+  "orig_nbformat": 4,
+  "colab": {
+   "provenance": []
+  }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 0
 }
diff --git a/registry.yaml b/registry.yaml
index 7fd8341c7a..eda8610a65 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -1676,12 +1676,20 @@
     - evals
     - completions
 
-- title: Steering Text-to-Speech for more dynamic audio generation
-  path: examples/voice_solutions/steering_tts.ipynb
-  date: 2024-10-21
+- title: Filtered Movie Search with Milvus and OpenAI
+  path: https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/milvus/Filtered_search_with_Milvus_and_OpenAI.ipynb
+  date: 2024-11-1
   authors:
-    - ericning-o
-    - gbergengruen
+    - jinhonglin-ryan
+    - filip-halt
   tags:
-    - completions
-    - audio
+    - Embeddings
+
+- title: Quickstart with Milvus and OpenAI
+  path: https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/milvus/Getting_started_with_Milvus_and_OpenAI.ipynb
+  date: 2024-11-1
+  authors:
+    - jinhonglin-ryan
+    - filip-halt
+  tags:
+    - Embeddings
\ No newline at end of file