-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Oraclevs integration #8007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hackerdave
wants to merge
15
commits into
langchain-ai:main
Choose a base branch
from
skmishraoracle:oraclevs_integration_new
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Oraclevs integration #8007
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
669a4b4
Add docs
hackerdave 2df1304
Add doc loader and vector store
hackerdave 5b997e3
Add dependencies
hackerdave ec0654e
Add entry points
hackerdave 0c2168d
Fix markdown
hackerdave f7ab72b
Doc changes
hackerdave d80b384
Update docs
hackerdave f441ebd
Use DB_TYPE_JSON
hackerdave ff68e9d
Update doc
hackerdave 6686397
Update docs
hackerdave 344ab07
Update env vars
hackerdave e5cca33
Check SQL names
hackerdave 213ae2f
Update env vars
hackerdave 69e3f80
Add more tests
hackerdave ea6f729
bind vars, fix long lines
hackerdave File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,288 @@ | ||
# Oracle AI Vector Search with LangChain.js Integration | ||
|
||
## Introduction | ||
|
||
Oracle AI Vector Search enables semantic search on unstructured data while simultaneously providing relational search capabilities on business data, all within a unified system. This approach eliminates the need for a separate vector database, reducing data fragmentation and improving efficiency. | ||
|
||
By integrating Oracle AI Vector Search with LangChain, you can build a powerful pipeline for Retrieval Augmented Generation (RAG), leveraging Oracle's robust database features. | ||
|
||
Key Advantages of Oracle Database | ||
Oracle AI Vector Search is built on top of the Oracle Database, providing several key features: | ||
|
||
* Partitioning Support | ||
* Real Application Clusters (RAC) Scalability | ||
* Exadata Smart Scans | ||
* Geographically Distributed Shard Processing | ||
* Transactional Capabilities | ||
* Parallel SQL | ||
* Disaster Recovery | ||
* Advanced Security | ||
* Oracle Machine Learning | ||
* Oracle Graph Database | ||
* Oracle Spatial and Graph | ||
* Oracle Blockchain | ||
* JSON Support | ||
|
||
## Guide Overview | ||
|
||
This guide demonstrates how to integrate Oracle AI Vector Search with LangChain to create an end-to-end RAG pipeline. You'll learn how to: | ||
|
||
* Load documents from different sources using OracleDocLoader. | ||
* Summarize documents inside or outside the database using OracleSummary. | ||
* Generate embeddings either inside or outside the database using OracleEmbeddings. | ||
* Chunk documents based on specific needs using OracleTextSplitter. | ||
* Store, index, and query data using OracleVS. | ||
|
||
## Getting Started | ||
|
||
If you're new to Oracle Database, consider using the [free Oracle Database 23ai](https://www.oracle.com/database/free/) to get started. | ||
|
||
## Best Practices | ||
|
||
* User Management: Create dedicated users for your Oracle Database projects instead of using the system user for security and control purposes. See the end-to-end guide for more details. | ||
* User Privileges: Be sure to manage user privileges effectively to maintain database security. You can find more information in the official Oracle Database documentation. | ||
|
||
## Prerequisites | ||
|
||
To get started, install the [Oracle Database JavaScript client driver](https://node-oracledb.readthedocs.io/en/latest/): | ||
|
||
``` typescript | ||
npm install oracledb | ||
``` | ||
|
||
## Document Preparation | ||
|
||
Assuming you have documents stored in a file system that you want to use with Oracle AI Vector Search and LangChain, these documents need to be instances of langchain/core/documents. | ||
|
||
Example: Ingesting JSON Documents | ||
hackerdave marked this conversation as resolved.
Show resolved
Hide resolved
|
||
In the following TypeScript example, we demonstrate how to ingest documents from JSON files: | ||
|
||
```typescript | ||
private createDocument(row: DataRow): Document { | ||
const metadata = { | ||
id: row.id, | ||
link: row.link, | ||
}; | ||
return new Document({ pageContent: row.text, metadata: metadata }); | ||
} | ||
|
||
public async ingestJson(): Promise<Document[]> { | ||
try { | ||
const filePath = `${this.docsDir}${this.filename}`; | ||
const fileContent = await fs.readFile(filePath, {encoding: 'utf8'}); | ||
const jsonData: DataRow[] = JSON.parse(fileContent); | ||
return jsonData.map((row) => this.createDocument(row)); | ||
} catch (error) { | ||
console.error('An error occurred while ingesting JSON:', error); | ||
throw error; // Rethrow for the calling function to handle | ||
} | ||
} | ||
``` | ||
|
||
## LangChain and Oracle Integration | ||
|
||
The Oracle AI Vector Search LangChain library offers a rich set of APIs for document processing, which includes loading, chunking, summarizing, and embedding generation. Here's how to set up a connection and integrate Oracle with LangChain. | ||
|
||
## Connecting to Oracle Database | ||
|
||
Below is an example of how to connect to an Oracle Database using both a direct connection and a connection pool: | ||
hackerdave marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```typescript | ||
async function dbConnect(): Promise<oracledb.Connection> { | ||
const connection = await oracledb.getConnection({ | ||
user: '****', | ||
password: '****', | ||
connectString: '***.**.***.**:1521/****' | ||
}); | ||
console.log('Connection created...'); | ||
return connection; | ||
} | ||
|
||
async function dbPool(): Promise<oracledb.Pool> { | ||
const pool = await oracledb.createPool({ | ||
user: '****', | ||
password: '****', | ||
connectString: '***.**.***.**:1521/****' | ||
}); | ||
console.log('Connection pool started...'); | ||
return pool; | ||
} | ||
``` | ||
|
||
## Testing the Integration | ||
|
||
Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with LangChain. | ||
|
||
Example Test Class | ||
|
||
``` typescript | ||
class TestsOracleVS { | ||
client: any | null = null; | ||
embeddingFunction: HuggingFaceTransformersEmbeddings; | ||
dbConfig: Record<string, any> = {}; | ||
oraclevs!: OracleVS; | ||
|
||
constructor(embeddingFunction: HuggingFaceTransformersEmbeddings) { | ||
this.embeddingFunction = embeddingFunction; | ||
} | ||
|
||
async init(): Promise<void> { | ||
this.client = await dbPool(); | ||
this.dbConfig = { | ||
"client": this.client, | ||
"tableName": "some_tablenm", | ||
"distanceStrategy": DistanceStrategy.DOT_PRODUCT, | ||
"query": "What are the salient features of OracleDB?" | ||
}; | ||
this.oraclevs = new OracleVS(this.embeddingFunction, this.dbConfig); | ||
} | ||
|
||
public async testCreateIndex(): Promise<void> { | ||
const connection: oracledb.Connection = await dbConnect(); | ||
await createIndex(connection, this.oraclevs, { | ||
idxName: "IVF", | ||
idxType: "IVF", | ||
neighborPart: 64, | ||
accuracy: 90 | ||
}); | ||
console.log("Index created successfully"); | ||
await connection.close(); | ||
} | ||
|
||
// We are ready to test SimilaritySearchByVector | ||
// This one passes an embedding which is a number array, a k value, | ||
// and a filter. This call returns documents ordered by distance. | ||
public async testSimilaritySearchByVector( | ||
embedding: number[], | ||
k: number, | ||
filter?: OracleVS["FilterType"], | ||
): Promise<[DocumentInterface, number][]> { | ||
return this.oraclevs.similaritySearchVectorWithScore( | ||
embedding, | ||
k, | ||
filter, | ||
); | ||
} | ||
|
||
// This call does the same except that it returns Documents and embeddings. | ||
public async testSimilaritySearchByVectorReturningEmbeddings( | ||
embedding: number[], | ||
k: number = 4, | ||
filter?: OracleVS["FilterType"], | ||
): Promise<[Document, number, Float32Array | number[]][]> { | ||
return await this.oraclevs.similaritySearchByVectorReturningEmbeddings( embedding, k, filter); | ||
} | ||
|
||
// This call tests out the MaxMarginalRelevanceSearch | ||
// the parameters are self explanatory. | ||
// The Callback is reserved for future use. | ||
public async testMaxMarginalRelevanceSearch( | ||
query: string, | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | ||
): Promise<DocumentInterface[]> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
// @ts-ignore | ||
return this.oraclevs.maxMarginalRelevanceSearch(query, options, _callbacks); | ||
} | ||
|
||
// This call is the same as above except that it takes a vector | ||
// instead of a query as an argument. | ||
public async testMaxMarginalRelevanceSearchByVector( | ||
query: number[], | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | undefined | ||
): Promise<DocumentInterface[]> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
return this.oraclevs!.maxMarginalRelevanceSearchByVector(query, options, _callbacks); | ||
} | ||
|
||
// This too is the same as above except that it returns document and the score. | ||
public async testMaxMarginalRelevanceSearchWithScoreByVector( | ||
embedding: number[], | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | undefined | ||
): Promise<Array<{ document: Document; score: number }>> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
return this.oraclevs.maxMarginalRelevanceSearchWithScoreByVector(embedding, options, _callbacks) | ||
} | ||
|
||
// This call tests out the delete feature. | ||
testDelete( params: { ids?: string[], deleteAll?: boolean } ): Promise<void> { | ||
return this.oraclevs.delete(params); | ||
} | ||
} | ||
|
||
// The runTestOracleVS is the driver to test out each of the calls. | ||
async function runTestsOracleVS() { | ||
// Initialize dotenv to load environment variables | ||
dotenv.config(); | ||
const query = "What is the language used by Oracle database"; | ||
|
||
// Set up the embedding function model: "Xenova/all-MiniLM-L6-v2" | ||
const embeddingFunction = new HuggingFaceTransformersEmbeddings(); | ||
if (!embeddingFunction) { | ||
console.error("Failed to initialize the embedding function."); | ||
return; | ||
} | ||
|
||
if (!(embeddingFunction instanceof Embeddings)) { | ||
console.error("Embedding function is not an instance of Embeddings."); | ||
return; | ||
} | ||
|
||
console.log("Embedding function initialized successfully"); | ||
|
||
// Initialize the TestsOracleVS class | ||
const testsOracleVS = new TestsOracleVS("concepts23c_small.json", | ||
embeddingFunction); | ||
|
||
// Initialize connection and other setup | ||
await testsOracleVS.init(); | ||
|
||
// Ingest JSON data to create documents | ||
const documents = await testsOracleVS.testIngestJson(); | ||
await OracleVS.fromDocuments( | ||
documents, | ||
testsOracleVS.embeddingFunction, | ||
testsOracleVS.dbConfig | ||
) | ||
|
||
// Create an index | ||
await testsOracleVS.testCreateIndex(); | ||
|
||
// Assume some dummy embedding vector for demonstration | ||
// const embedding: number[] = [0.1, 0.2, 0.3, 0.4]; // Example embedding | ||
|
||
// Perform a similarity search by vector | ||
const embedding = await embeddingFunction.embedQuery(query); | ||
const similaritySearchByVector = await testsOracleVS.testSimilaritySearchByVector(embedding, 5); | ||
console.log("Similarity Search Results:", similaritySearchByVector); | ||
|
||
// Perform a similarity search by vector | ||
const similaritySearchByEmbeddings = | ||
await testsOracleVS.testSimilaritySearchByVectorReturningEmbeddings(embedding, 5) | ||
console.log("Similarity Search Results:", similaritySearchByEmbeddings); | ||
|
||
const maxMarginalRelevanceSearch = | ||
await testsOracleVS.testMaxMarginalRelevanceSearch(query) | ||
console.log("Max Marginal Relevance Search:", maxMarginalRelevanceSearch); | ||
|
||
const maxMarginalRelevanceSearchByVector = | ||
await testsOracleVS.testMaxMarginalRelevanceSearchByVector(embedding) | ||
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchByVector); | ||
|
||
const maxMarginalRelevanceSearchWithScoreByVector = | ||
await testsOracleVS.testMaxMarginalRelevanceSearchWithScoreByVector(embedding) | ||
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchWithScoreByVector); | ||
|
||
} | ||
``` | ||
|
||
That is all for now. |
81 changes: 81 additions & 0 deletions
81
docs/core_docs/docs/integrations/document_loaders/file_loaders/oracleai.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Oracle AI Vector Search: Document Processing | ||
|
||
## Load Documents | ||
|
||
You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F). | ||
|
||
A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-2EC9C942-0989-4FEE-909A-3575349AC706). | ||
|
||
After loading the documents, you can may want to split the text for embedding with OracleTextSplitter or generate a summary with OracleSummary. | ||
|
||
Below is a sample code snippet that demonstrates how to use OracleDocLoader | ||
|
||
```typescript | ||
import {OracleDocLoader} from "@langchain/community/document_loaders/fs/oracle"; | ||
|
||
/* | ||
// loading a local file | ||
loader_params = {"file": "<file>"}; | ||
|
||
// loading from a local directory | ||
loader_params = {"dir": "<directory>"}; | ||
*/ | ||
|
||
// loading from Oracle Database table | ||
// make sure you have the table with this specification | ||
const loader_params = { | ||
"owner": "testuser", | ||
"tablename": "demo_tab", | ||
"colname": "data", | ||
}; | ||
|
||
// load the docs | ||
const loader = new OracleDocLoader(conn, loader_params); | ||
const docs = await loader.load(); | ||
|
||
// verify | ||
console.log(`Number of docs loaded: ${docs.length}`); | ||
//console.log(`Document-0: ${docs[0].pageContent}`); | ||
``` | ||
|
||
## Split Documents | ||
|
||
The documents may vary in size, ranging from small to very large. You may need to chunk the documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-4E145629-7098-4C7C-804F-FC85D1F24240). | ||
|
||
Below is a sample code illustrating how to implement this: | ||
|
||
```typescript | ||
import {OracleTextSplitter} from "@langchain/textsplitters/oracle"; | ||
|
||
/* | ||
// Some examples | ||
// split by chars, max 500 chars | ||
splitter_params = {"split": "chars", "max": 500, "normalize": "all"}; | ||
|
||
// split by words, max 100 words | ||
splitter_params = {"split": "words", "max": 100, "normalize": "all"}; | ||
|
||
// split by sentence, max 20 sentences | ||
splitter_params = {"split": "sentence", "max": 20, "normalize": "all"}; | ||
*/ | ||
|
||
// split by default parameters | ||
const splitter_params = {"normalize": "all"}; | ||
|
||
// get the splitter instance | ||
const splitter = new OracleTextSplitter(conn, splitter_params); | ||
|
||
let list_chunks = []; | ||
for (let[, doc]of docs.entries()) { | ||
let chunks = await splitter.splitText(doc.pageContent); | ||
list_chunks.push(chunks); | ||
} | ||
|
||
// verify | ||
console.log(`Number of Chunks: ${list_chunks.length}`); | ||
//console.log(`Chunk-0: ${list_chunks[0]}`); // content | ||
``` | ||
|
||
## End to End Demo | ||
|
||
Please refer to the complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchainjs/tree/main/cookbook/oracleai.mdx) to build an end to end RAG pipeline with the help of Oracle AI Vector Search. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.