Skip to content

Oraclevs integration #8007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
535 changes: 535 additions & 0 deletions cookbook/oracleai.mdx

Large diffs are not rendered by default.

288 changes: 288 additions & 0 deletions cookbook/oraclevs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
# Oracle AI Vector Search with LangChain.js Integration

## Introduction

Oracle AI Vector Search enables semantic search on unstructured data while simultaneously providing relational search capabilities on business data, all within a unified system. This approach eliminates the need for a separate vector database, reducing data fragmentation and improving efficiency.

By integrating Oracle AI Vector Search with LangChain, you can build a powerful pipeline for Retrieval Augmented Generation (RAG), leveraging Oracle's robust database features.

Key Advantages of Oracle Database
Oracle AI Vector Search is built on top of the Oracle Database, providing several key features:

* Partitioning Support
* Real Application Clusters (RAC) Scalability
* Exadata Smart Scans
* Geographically Distributed Shard Processing
* Transactional Capabilities
* Parallel SQL
* Disaster Recovery
* Advanced Security
* Oracle Machine Learning
* Oracle Graph Database
* Oracle Spatial and Graph
* Oracle Blockchain
* JSON Support

## Guide Overview

This guide demonstrates how to integrate Oracle AI Vector Search with LangChain to create an end-to-end RAG pipeline. You'll learn how to:

* Load documents from different sources using OracleDocLoader.
* Summarize documents inside or outside the database using OracleSummary.
* Generate embeddings either inside or outside the database using OracleEmbeddings.
* Chunk documents based on specific needs using OracleTextSplitter.
* Store, index, and query data using OracleVS.

## Getting Started

If you're new to Oracle Database, consider using the [free Oracle Database 23ai](https://www.oracle.com/database/free/) to get started.

## Best Practices

* User Management: Create dedicated users for your Oracle Database projects instead of using the system user for security and control purposes. See the end-to-end guide for more details.
* User Privileges: Be sure to manage user privileges effectively to maintain database security. You can find more information in the official Oracle Database documentation.

## Prerequisites

To get started, install the [Oracle Database JavaScript client driver](https://node-oracledb.readthedocs.io/en/latest/):

``` typescript
npm install oracledb
```

## Document Preparation

Assuming you have documents stored in a file system that you want to use with Oracle AI Vector Search and LangChain, these documents need to be instances of langchain/core/documents.

Example: Ingesting JSON Documents
In the following TypeScript example, we demonstrate how to ingest documents from JSON files:

```typescript
private createDocument(row: DataRow): Document {
const metadata = {
id: row.id,
link: row.link,
};
return new Document({ pageContent: row.text, metadata: metadata });
}

public async ingestJson(): Promise<Document[]> {
try {
const filePath = `${this.docsDir}${this.filename}`;
const fileContent = await fs.readFile(filePath, {encoding: 'utf8'});
const jsonData: DataRow[] = JSON.parse(fileContent);
return jsonData.map((row) => this.createDocument(row));
} catch (error) {
console.error('An error occurred while ingesting JSON:', error);
throw error; // Rethrow for the calling function to handle
}
}
```

## LangChain and Oracle Integration

The Oracle AI Vector Search LangChain library offers a rich set of APIs for document processing, which includes loading, chunking, summarizing, and embedding generation. Here's how to set up a connection and integrate Oracle with LangChain.

## Connecting to Oracle Database

Below is an example of how to connect to an Oracle Database using both a direct connection and a connection pool:

```typescript
async function dbConnect(): Promise<oracledb.Connection> {
const connection = await oracledb.getConnection({
user: '****',
password: '****',
connectString: '***.**.***.**:1521/****'
});
console.log('Connection created...');
return connection;
}

async function dbPool(): Promise<oracledb.Pool> {
const pool = await oracledb.createPool({
user: '****',
password: '****',
connectString: '***.**.***.**:1521/****'
});
console.log('Connection pool started...');
return pool;
}
```

## Testing the Integration

Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with LangChain.

Example Test Class

``` typescript
class TestsOracleVS {
client: any | null = null;
embeddingFunction: HuggingFaceTransformersEmbeddings;
dbConfig: Record<string, any> = {};
oraclevs!: OracleVS;

constructor(embeddingFunction: HuggingFaceTransformersEmbeddings) {
this.embeddingFunction = embeddingFunction;
}

async init(): Promise<void> {
this.client = await dbPool();
this.dbConfig = {
"client": this.client,
"tableName": "some_tablenm",
"distanceStrategy": DistanceStrategy.DOT_PRODUCT,
"query": "What are the salient features of OracleDB?"
};
this.oraclevs = new OracleVS(this.embeddingFunction, this.dbConfig);
}

public async testCreateIndex(): Promise<void> {
const connection: oracledb.Connection = await dbConnect();
await createIndex(connection, this.oraclevs, {
idxName: "IVF",
idxType: "IVF",
neighborPart: 64,
accuracy: 90
});
console.log("Index created successfully");
await connection.close();
}

// We are ready to test SimilaritySearchByVector
// This one passes an embedding which is a number array, a k value,
// and a filter. This call returns documents ordered by distance.
public async testSimilaritySearchByVector(
embedding: number[],
k: number,
filter?: OracleVS["FilterType"],
): Promise<[DocumentInterface, number][]> {
return this.oraclevs.similaritySearchVectorWithScore(
embedding,
k,
filter,
);
}

// This call does the same except that it returns Documents and embeddings.
public async testSimilaritySearchByVectorReturningEmbeddings(
embedding: number[],
k: number = 4,
filter?: OracleVS["FilterType"],
): Promise<[Document, number, Float32Array | number[]][]> {
return await this.oraclevs.similaritySearchByVectorReturningEmbeddings( embedding, k, filter);
}

// This call tests out the MaxMarginalRelevanceSearch
// the parameters are self explanatory.
// The Callback is reserved for future use.
public async testMaxMarginalRelevanceSearch(
query: string,
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks
): Promise<DocumentInterface[]> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
// @ts-ignore
return this.oraclevs.maxMarginalRelevanceSearch(query, options, _callbacks);
}

// This call is the same as above except that it takes a vector
// instead of a query as an argument.
public async testMaxMarginalRelevanceSearchByVector(
query: number[],
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks | undefined
): Promise<DocumentInterface[]> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
return this.oraclevs!.maxMarginalRelevanceSearchByVector(query, options, _callbacks);
}

// This too is the same as above except that it returns document and the score.
public async testMaxMarginalRelevanceSearchWithScoreByVector(
embedding: number[],
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks | undefined
): Promise<Array<{ document: Document; score: number }>> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
return this.oraclevs.maxMarginalRelevanceSearchWithScoreByVector(embedding, options, _callbacks)
}

// This call tests out the delete feature.
testDelete( params: { ids?: string[], deleteAll?: boolean } ): Promise<void> {
return this.oraclevs.delete(params);
}
}

// The runTestOracleVS is the driver to test out each of the calls.
async function runTestsOracleVS() {
// Initialize dotenv to load environment variables
dotenv.config();
const query = "What is the language used by Oracle database";

// Set up the embedding function model: "Xenova/all-MiniLM-L6-v2"
const embeddingFunction = new HuggingFaceTransformersEmbeddings();
if (!embeddingFunction) {
console.error("Failed to initialize the embedding function.");
return;
}

if (!(embeddingFunction instanceof Embeddings)) {
console.error("Embedding function is not an instance of Embeddings.");
return;
}

console.log("Embedding function initialized successfully");

// Initialize the TestsOracleVS class
const testsOracleVS = new TestsOracleVS("concepts23c_small.json",
embeddingFunction);

// Initialize connection and other setup
await testsOracleVS.init();

// Ingest JSON data to create documents
const documents = await testsOracleVS.testIngestJson();
await OracleVS.fromDocuments(
documents,
testsOracleVS.embeddingFunction,
testsOracleVS.dbConfig
)

// Create an index
await testsOracleVS.testCreateIndex();

// Assume some dummy embedding vector for demonstration
// const embedding: number[] = [0.1, 0.2, 0.3, 0.4]; // Example embedding

// Perform a similarity search by vector
const embedding = await embeddingFunction.embedQuery(query);
const similaritySearchByVector = await testsOracleVS.testSimilaritySearchByVector(embedding, 5);
console.log("Similarity Search Results:", similaritySearchByVector);

// Perform a similarity search by vector
const similaritySearchByEmbeddings =
await testsOracleVS.testSimilaritySearchByVectorReturningEmbeddings(embedding, 5)
console.log("Similarity Search Results:", similaritySearchByEmbeddings);

const maxMarginalRelevanceSearch =
await testsOracleVS.testMaxMarginalRelevanceSearch(query)
console.log("Max Marginal Relevance Search:", maxMarginalRelevanceSearch);

const maxMarginalRelevanceSearchByVector =
await testsOracleVS.testMaxMarginalRelevanceSearchByVector(embedding)
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchByVector);

const maxMarginalRelevanceSearchWithScoreByVector =
await testsOracleVS.testMaxMarginalRelevanceSearchWithScoreByVector(embedding)
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchWithScoreByVector);

}
```

That is all for now.
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Oracle AI Vector Search: Document Processing

## Load Documents

You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).

A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-2EC9C942-0989-4FEE-909A-3575349AC706).

After loading the documents, you can may want to split the text for embedding with OracleTextSplitter or generate a summary with OracleSummary.

Below is a sample code snippet that demonstrates how to use OracleDocLoader

```typescript
import {OracleDocLoader} from "@langchain/community/document_loaders/fs/oracle";

/*
// loading a local file
loader_params = {"file": "<file>"};

// loading from a local directory
loader_params = {"dir": "<directory>"};
*/

// loading from Oracle Database table
// make sure you have the table with this specification
const loader_params = {
"owner": "testuser",
"tablename": "demo_tab",
"colname": "data",
};

// load the docs
const loader = new OracleDocLoader(conn, loader_params);
const docs = await loader.load();

// verify
console.log(`Number of docs loaded: ${docs.length}`);
//console.log(`Document-0: ${docs[0].pageContent}`);
```

## Split Documents

The documents may vary in size, ranging from small to very large. You may need to chunk the documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-4E145629-7098-4C7C-804F-FC85D1F24240).

Below is a sample code illustrating how to implement this:

```typescript
import {OracleTextSplitter} from "@langchain/textsplitters/oracle";

/*
// Some examples
// split by chars, max 500 chars
splitter_params = {"split": "chars", "max": 500, "normalize": "all"};

// split by words, max 100 words
splitter_params = {"split": "words", "max": 100, "normalize": "all"};

// split by sentence, max 20 sentences
splitter_params = {"split": "sentence", "max": 20, "normalize": "all"};
*/

// split by default parameters
const splitter_params = {"normalize": "all"};

// get the splitter instance
const splitter = new OracleTextSplitter(conn, splitter_params);

let list_chunks = [];
for (let[, doc]of docs.entries()) {
let chunks = await splitter.splitText(doc.pageContent);
list_chunks.push(chunks);
}

// verify
console.log(`Number of Chunks: ${list_chunks.length}`);
//console.log(`Chunk-0: ${list_chunks[0]}`); // content
```

## End to End Demo

Please refer to the complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchainjs/tree/main/cookbook/oracleai.mdx) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.
Loading