Create and Deploy a OpenAI Pipeline

Overview

In this lab, you will use unstructured data files like contract documents, leases, user manuals etc. to be analyzed using OCR, Azure Cognitive Search Semantic Search feature and the capabilities of Azure Open AI Large Language Models to summerize key information after converting them into more structured index files. For this lab, you will use the dataset provided at Lab3 Sample Data.

Goal

How to leverage the Large Language Model(LLM) from GPT-3 to extract concise summary from a subset of huge document repository using OpenAI and Azure Semantic Search

Pre-requisites

The accelerator is deployed and ready in the resource group
You have access to sample data to test OpenAI

Instructions

Step 1a - Create a OpenAI Generic Pipeline

Step 1b - Get Sample Configurations from GPT-3 Playground

At this stage, we select the model we want to use and the feature we want to leverage. In this case we will be using the Davinci model and the Summerize feature. The playground brings in a sample on the editor. Select the content of the 'Conversation' section and replace with ${document} to ensure the dynamic content is used on runtime. After that click on 'View Code' on top right.

On the pop up, there will be drop down menu where by default 'Python' will be selected. Please change that to 'json' and Copy the code snippet.

Go back to the BPA tab and replace the default text on the Generic OpenAI component opened earlier with the copied text.

That completes the pipeline

Step 2 - Ingest Data for the pipeline

There are 2 options for ingesting the data for the pipeline:

Use single file upload for smaller files [smaller than 4 pages]
Use the split document option to split the larger documents and upload the individual split files to the pipeline.

Step 3 - Configure Search

The get to Search Service. To view the results, go to portal.azure.com (Azure Portal) again in your browser and get to the resource group like we did earlier in Step 1. There, in the resource group, click on the resource that is of type Search Service.
Click on Import Data.
Select Azure Cosmos DB from the dropdown in datasource.
Provide a name for datasource and click on Choose an existing connection for Connection String. Here the Azure CosmosDB resource created as a part of BPA accelerator already setup will be one of the sources you can choose from.
Keep the default for Managed identity Authentication, which is None. For Databases and Collection use the dropdown to select the same name as the Cosmos DB you selected at step 15.
Under Query, use the following Query. The pipeline should match the pipeline name you used in step 3

SELECT * FROM c WHERE c.pipeline = 'YOUR-PIPELINE-NAME' AND c._ts > @HighWaterMark
Click Next: Add cognitive skills (Optional). This validates and creates the index schema.
In the next Screen(Add cognitive skills (Optional)), Click Skip to: Customize Target Index,
Provide a name for the Index and click on Next: Create an indexer
Provide a name for the indexer and click Submit
You will get a notification that the import is successfully configured

Step 4 - Configure Semantic Search

You can select Semantic Search Plan
Go back to your search index and configure the Semantic Configuration
Select the Semantic Configuration and click on Create new.

On the pop up do the following:
- Give a name to the Semantic Search Config
- Select the Title field and select 'filename'
- Select the 'content' field and any other relevant fields for Content Fields
- Select Save

Step 5 - Perform Semantic Search

Now, go back to the accelerator url that you retreived from Step 1 and click on Sample Search Application.

This opens the same search application
- Enable Symantic Search
- Select the relevant Symantic Search Config
- Enable Open AI Summary and Answer
- Provide a search query based on your document, like:
  - 'What is RMA?'
  - 'What are the Cleaning instructions?'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!