langchain-ai · hackerdave · Apr 11, 2025 · Apr 11, 2025 · Apr 11, 2025 · Apr 11, 2025
diff --git a/cookbook/oracleai.mdx b/cookbook/oracleai.mdx
diff --git a/cookbook/oraclevs.md b/cookbook/oraclevs.md
@@ -0,0 +1,288 @@
+# Oracle AI Vector Search with LangChain.js Integration
+
+## Introduction
+
+Oracle AI Vector Search enables semantic search on unstructured data while simultaneously providing relational search capabilities on business data, all within a unified system. This approach eliminates the need for a separate vector database, reducing data fragmentation and improving efficiency.
+
+By integrating Oracle AI Vector Search with LangChain, you can build a powerful pipeline for Retrieval Augmented Generation (RAG), leveraging Oracle's robust database features.
+
+Key Advantages of Oracle Database
+Oracle AI Vector Search is built on top of the Oracle Database, providing several key features:
+
+* Partitioning Support
+* Real Application Clusters (RAC) Scalability
+* Exadata Smart Scans
+* Geographically Distributed Shard Processing
+* Transactional Capabilities
+* Parallel SQL
+* Disaster Recovery
+* Advanced Security
+* Oracle Machine Learning
+* Oracle Graph Database
+* Oracle Spatial and Graph
+* Oracle Blockchain
+* JSON Support
+
+## Guide Overview
+
+This guide demonstrates how to integrate Oracle AI Vector Search with LangChain to create an end-to-end RAG pipeline. You'll learn how to:
+
+* Load documents from different sources using OracleDocLoader.
+* Summarize documents inside or outside the database using OracleSummary.
+* Generate embeddings either inside or outside the database using OracleEmbeddings.
+* Chunk documents based on specific needs using OracleTextSplitter.
+* Store, index, and query data using OracleVS.
+
+## Getting Started
+
+If you're new to Oracle Database, consider using the [free Oracle Database 23ai](https://www.oracle.com/database/free/) to get started.
+
+## Best Practices
+
+* User Management: Create dedicated users for your Oracle Database projects instead of using the system user for security and control purposes. See the end-to-end guide for more details.
+* User Privileges: Be sure to manage user privileges effectively to maintain database security. You can find more information in the official Oracle Database documentation.
+
+## Prerequisites
+
+To get started, install the [Oracle Database JavaScript client driver](https://node-oracledb.readthedocs.io/en/latest/):
+
+``` typescript
+npm install oracledb
+```
+
+## Document Preparation
+
+Assuming you have documents stored in a file system that you want to use with Oracle AI Vector Search and LangChain, these documents need to be instances of langchain/core/documents.
+
+Example: Ingesting JSON Documents
+In the following TypeScript example, we demonstrate how to ingest documents from JSON files:
+
+```typescript
+private createDocument(row: DataRow): Document {
+    const metadata = {
+        id: row.id,    
+        link: row.link,
+    };
+    return new Document({ pageContent: row.text, metadata: metadata });
+}
+
+public async ingestJson(): Promise<Document[]> {
+   try {
+       const filePath = `${this.docsDir}${this.filename}`;
+       const fileContent = await fs.readFile(filePath, {encoding: 'utf8'});
+       const jsonData: DataRow[] = JSON.parse(fileContent);
+       return jsonData.map((row) => this.createDocument(row));
+   } catch (error) {
+       console.error('An error occurred while ingesting JSON:', error);
+       throw error; // Rethrow for the calling function to handle
+   }
+}
+```
+
+## LangChain and Oracle Integration
+
+The Oracle AI Vector Search LangChain library offers a rich set of APIs for document processing, which includes loading, chunking, summarizing, and embedding generation. Here's how to set up a connection and integrate Oracle with LangChain.
+
+## Connecting to Oracle Database
+
+Below is an example of how to connect to an Oracle Database using both a direct connection and a connection pool:
+
+```typescript
+async function dbConnect(): Promise<oracledb.Connection> {
+    const connection = await oracledb.getConnection({
+        user: '****',
+        password: '****',
+        connectString: '***.**.***.**:1521/****'
+    });
+    console.log('Connection created...');
+    return connection;
+}
+
+async function dbPool(): Promise<oracledb.Pool> {
+    const pool = await oracledb.createPool({
+        user: '****',
+        password: '****',
+        connectString: '***.**.***.**:1521/****'
+    });
+    console.log('Connection pool started...');
+    return pool;
+}
+```
+
+## Testing the Integration
+
+Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with LangChain.
+
+Example Test Class
+
+``` typescript
+class TestsOracleVS {
+    client: any | null = null;
+    embeddingFunction: HuggingFaceTransformersEmbeddings;
+    dbConfig: Record<string, any> = {};
+    oraclevs!: OracleVS;
+
+    constructor(embeddingFunction: HuggingFaceTransformersEmbeddings) {
+        this.embeddingFunction = embeddingFunction;
+    }
+
+    async init(): Promise<void> {
+        this.client = await dbPool();
+        this.dbConfig = {
+            "client": this.client,
+            "tableName": "some_tablenm",
+            "distanceStrategy": DistanceStrategy.DOT_PRODUCT,
+            "query": "What are the salient features of OracleDB?"
+        };
+        this.oraclevs = new OracleVS(this.embeddingFunction, this.dbConfig);
+    }
+
+    public async testCreateIndex(): Promise<void> {
+        const connection: oracledb.Connection = await dbConnect();
+        await createIndex(connection, this.oraclevs, {
+            idxName: "IVF",
+            idxType: "IVF",
+            neighborPart: 64,
+            accuracy: 90
+        });
+        console.log("Index created successfully");
+        await connection.close();
+    }
+
+    // We are ready to test SimilaritySearchByVector
+    // This one passes an embedding which is a number array, a k value,
+    // and a filter. This call returns documents ordered by distance.
+    public async testSimilaritySearchByVector(
+        embedding: number[],
+        k: number,
+        filter?: OracleVS["FilterType"],
+    ): Promise<[DocumentInterface, number][]> {
+        return this.oraclevs.similaritySearchVectorWithScore(
+            embedding,
+            k,
+            filter,
+        );
+    }
+
+    // This call does the same except that it returns Documents and embeddings.
+    public async testSimilaritySearchByVectorReturningEmbeddings(
+        embedding: number[],
+        k: number = 4,
+        filter?: OracleVS["FilterType"],
+    ): Promise<[Document, number, Float32Array | number[]][]> {
+        return await this.oraclevs.similaritySearchByVectorReturningEmbeddings( embedding, k, filter);
+    }
+
+    // This call tests out the MaxMarginalRelevanceSearch
+    // the parameters are self explanatory.
+    // The Callback is reserved for future use.
+    public async testMaxMarginalRelevanceSearch(
+        query: string,
+        options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
+        _callbacks?: Callbacks
+    ): Promise<DocumentInterface[]> {
+        if (!options) {
+            options = { k: 10, fetchK: 20 }; // Default values for the options
+        }
+        // @ts-ignore
+        return this.oraclevs.maxMarginalRelevanceSearch(query, options, _callbacks);
+    }
+
+    // This call is the same as above except that it takes a vector
+    // instead of a query as an argument.
+    public async testMaxMarginalRelevanceSearchByVector(
+        query: number[],
+        options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
+        _callbacks?: Callbacks | undefined
+    ): Promise<DocumentInterface[]> {
+        if (!options) {
+            options = { k: 10, fetchK: 20 }; // Default values for the options
+        }
+        return this.oraclevs!.maxMarginalRelevanceSearchByVector(query, options, _callbacks);
+    }
+
+    // This too is the same as above except that it returns document and the score.
+    public async testMaxMarginalRelevanceSearchWithScoreByVector(
+        embedding: number[],
+        options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
+        _callbacks?: Callbacks | undefined
+    ): Promise<Array<{ document: Document; score: number }>> {
+        if (!options) {
+            options = { k: 10, fetchK: 20 }; // Default values for the options
+        }
+        return this.oraclevs.maxMarginalRelevanceSearchWithScoreByVector(embedding, options, _callbacks)
+    }
+
+    // This call tests out the delete feature.
+    testDelete( params: { ids?: string[], deleteAll?: boolean } ): Promise<void> {
+        return this.oraclevs.delete(params);
+    }
+}
+
+// The runTestOracleVS is the driver to test out each of the calls.
+async function runTestsOracleVS() {
+    // Initialize dotenv to load environment variables
+    dotenv.config();
+    const query = "What is the language used by Oracle database";
+
+    // Set up the embedding function model: "Xenova/all-MiniLM-L6-v2"
+    const embeddingFunction = new HuggingFaceTransformersEmbeddings();
+    if (!embeddingFunction) {
+        console.error("Failed to initialize the embedding function.");
+        return;
+    }
+
+    if (!(embeddingFunction instanceof Embeddings)) {
+        console.error("Embedding function is not an instance of Embeddings.");
+        return;
+    }
+
+    console.log("Embedding function initialized successfully");
+
+    // Initialize the TestsOracleVS class
+    const testsOracleVS = new TestsOracleVS("concepts23c_small.json",
+    embeddingFunction);
+
+    // Initialize connection and other setup
+    await testsOracleVS.init();
+
+    // Ingest JSON data to create documents
+    const documents = await testsOracleVS.testIngestJson();
+    await OracleVS.fromDocuments(
+        documents,
+        testsOracleVS.embeddingFunction,
+        testsOracleVS.dbConfig
+    )
+
+    // Create an index
+    await testsOracleVS.testCreateIndex();
+
+    // Assume some dummy embedding vector for demonstration
+    // const embedding: number[] = [0.1, 0.2, 0.3, 0.4]; // Example embedding
+
+    // Perform a similarity search by vector
+    const embedding = await embeddingFunction.embedQuery(query);
+    const similaritySearchByVector = await testsOracleVS.testSimilaritySearchByVector(embedding, 5);
+    console.log("Similarity Search Results:", similaritySearchByVector);
+
+    // Perform a similarity search by vector
+    const similaritySearchByEmbeddings =
+    await testsOracleVS.testSimilaritySearchByVectorReturningEmbeddings(embedding, 5)
+    console.log("Similarity Search Results:", similaritySearchByEmbeddings);
+
+    const maxMarginalRelevanceSearch =
+    await testsOracleVS.testMaxMarginalRelevanceSearch(query)
+    console.log("Max Marginal Relevance Search:", maxMarginalRelevanceSearch);
+
+    const maxMarginalRelevanceSearchByVector =
+    await testsOracleVS.testMaxMarginalRelevanceSearchByVector(embedding)
+    console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchByVector);
+
+    const maxMarginalRelevanceSearchWithScoreByVector =
+    await testsOracleVS.testMaxMarginalRelevanceSearchWithScoreByVector(embedding)
+    console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchWithScoreByVector);
+
+}
+```
+
+That is all for now.
diff --git a/docs/core_docs/docs/integrations/document_loaders/file_loaders/oracleai.mdx b/docs/core_docs/docs/integrations/document_loaders/file_loaders/oracleai.mdx
@@ -0,0 +1,81 @@
+# Oracle AI Vector Search: Document Processing
+
+## Load Documents
+
+You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).
+
+A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-2EC9C942-0989-4FEE-909A-3575349AC706).
+
+After loading the documents, you can may want to split the text for embedding with OracleTextSplitter or generate a summary with OracleSummary.
+
+Below is a sample code snippet that demonstrates how to use OracleDocLoader
+
+```typescript
+import {OracleDocLoader} from "@langchain/community/document_loaders/fs/oracle";
+
+/*
+// loading a local file
+loader_params = {"file": "<file>"};
+
+// loading from a local directory
+loader_params = {"dir": "<directory>"};
+*/
+
+// loading from Oracle Database table
+// make sure you have the table with this specification
+const loader_params = {
+  "owner": "testuser",
+  "tablename": "demo_tab",
+  "colname": "data",
+};
+
+// load the docs
+const loader = new OracleDocLoader(conn, loader_params);
+const docs = await loader.load();
+
+// verify
+console.log(`Number of docs loaded: ${docs.length}`);
+//console.log(`Document-0: ${docs[0].pageContent}`);
+```
+
+## Split Documents
+
+The documents may vary in size, ranging from small to very large. You may need to chunk the documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-4E145629-7098-4C7C-804F-FC85D1F24240).
+
+Below is a sample code illustrating how to implement this:
+
+```typescript
+import {OracleTextSplitter} from "@langchain/textsplitters/oracle";
+
+/*
+// Some examples
+// split by chars, max 500 chars
+splitter_params = {"split": "chars", "max": 500, "normalize": "all"};
+
+// split by words, max 100 words
+splitter_params = {"split": "words", "max": 100, "normalize": "all"};
+
+// split by sentence, max 20 sentences
+splitter_params = {"split": "sentence", "max": 20, "normalize": "all"};
+*/
+
+// split by default parameters
+const splitter_params = {"normalize": "all"};
+
+// get the splitter instance
+const splitter = new OracleTextSplitter(conn, splitter_params);
+
+let list_chunks = [];
+for (let[, doc]of docs.entries()) {
+  let chunks = await splitter.splitText(doc.pageContent);
+  list_chunks.push(chunks);
+}
+
+// verify
+console.log(`Number of Chunks: ${list_chunks.length}`);
+//console.log(`Chunk-0: ${list_chunks[0]}`); // content
+```
+
+## End to End Demo
+
+Please refer to the complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchainjs/tree/main/cookbook/oracleai.mdx) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.