Skip to content

feat(community): Enhance SAP HANA Vector integration with metadata columns, keyword filtering, and internal embeddings #8001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
5758820
feat: store specific metadata columns in dedicated table columns
yberber-sap Feb 24, 2025
fd1369d
feat: add keyword search functionality
yberber-sap Feb 24, 2025
3c5a85d
add tests for keyword filtering
yberber-sap Feb 24, 2025
db5e2db
add example about keyword filtering for docs
yberber-sap Feb 24, 2025
7da7eb3
fix linting issues
yberber-sap Feb 24, 2025
cc0ff6a
add internal embedding functionality
yberber-sap Feb 25, 2025
caa0a2f
improve integration tests for internal embeddings
yberber-sap Feb 25, 2025
28554b0
add documentation about internal embedding functionality
yberber-sap Feb 25, 2025
652fb4c
fix: correct minor typo in comment
yberber-sap Mar 3, 2025
529994c
refactor: rename f to filterObj for better readability
yberber-sap Mar 3, 2025
3c55d65
docs: mention contains operator in documentation
yberber-sap Mar 3, 2025
3dc1b55
refactor(types): improve type definition for sqlParams
yberber-sap Mar 3, 2025
48c9bef
add documentation link for VECTOR_EMBEDDING function
yberber-sap Mar 10, 2025
b6941c8
Code cleanup: add missing semicolon in queryTuple assignment
yberber-sap Mar 10, 2025
8f14070
Code cleanup: add missing semicolon in queryEmbedding assignment
yberber-sap Mar 10, 2025
c072a00
Cleanup: remove unused stm.execBatch call in hanavector.ts
yberber-sap Mar 10, 2025
629c353
Make setEmbeddings method private and rename it to _setEmbeddings
yberber-sap Mar 10, 2025
b727d14
Improve validation efficiency by wrapping TO_NVARCHAR in COUNT
yberber-sap Mar 10, 2025
4d0377a
Refactor extractKeywordSearchColumns to avoid redundant function crea…
yberber-sap Mar 10, 2025
bd0edad
sanitize metadata column in createMetadataProjection
yberber-sap Mar 10, 2025
2bee88f
Set a default internal embedding model ID to SAP_NEB.20240715 in inte…
yberber-sap Mar 10, 2025
785502b
use &lt; instead of < to resolve markdown issue
yberber-sap Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 27 additions & 14 deletions docs/core_docs/docs/integrations/vectorstores/hanavector.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,21 @@ import { Table, Tr, Th, Td } from "@mdx-js/react";

In addition to the basic value-based filtering capabilities, it is possible to use more advanced filtering. The table below shows the available filter operators.

| Operator | Semantic |
| ---------- | -------------------------------------------------------------------------- |
| `$eq` | Equality (==) |
| `$ne` | Inequality (!=) |
| `$lt` | Less than (<) |
| `$lte` | Less than or equal (<=) |
| `$gt` | Greater than (>) |
| `$gte` | Greater than or equal (>=) |
| `$in` | Contained in a set of given values (in) |
| `$nin` | Not contained in a set of given values (not in) |
| `$between` | Between the range of two boundary values |
| `$like` | Text equality based on the "LIKE" semantics in SQL (using "%" as wildcard) |
| `$and` | Logical "and", supporting 2 or more operands |
| `$or` | Logical "or", supporting 2 or more operands |
| Operator | Semantic |
| ----------- | -------------------------------------------------------------------------- |
| `$eq` | Equality (==) |
| `$ne` | Inequality (!=) |
| `$lt` | Less than (&lt;) |
| `$lte` | Less than or equal (&lt;=) |
| `$gt` | Greater than (>) |
| `$gte` | Greater than or equal (>=) |
| `$in` | Contained in a set of given values (in) |
| `$nin` | Not contained in a set of given values (not in) |
| `$between` | Between the range of two boundary values |
| `$like` | Text equality based on the "LIKE" semantics in SQL (using "%" as wildcard) |
| `$contains` | Filters documents containing a specific keyword |
| `$and` | Logical "and", supporting 2 or more operands |
| `$or` | Logical "or", supporting 2 or more operands |

import ExampleAdvancedFilter from "@examples/indexes/vector_stores/hana_vector/advancedFiltering.ts";

Expand All @@ -82,6 +83,18 @@ import ExampleChain from "@examples/indexes/vector_stores/hana_vector/chains.ts"

<CodeBlock language="typescript">{ExampleChain}</CodeBlock>

## Internal Embedding Functionality

SAP HANA Cloud Vector Engine supports computing embeddings directly in the database by leveraging its native `VECTOR_EMBEDDING` function. This approach eliminates the need for an external embedding service, improving performance and enhancing data security.

To enable this functionality, instantiate a `HanaInternalEmbeddings` object with the internal embedding model ID and pass this instance to your `HanaDB` vector store.

For more details on the `VECTOR_EMBEDDING` function, refer to the official [SAP HANA Cloud documentation](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/vector-embedding-function-vector?locale=en-US).

import ExampleInternalEmbeddings from "@examples/indexes/vector_stores/hana_vector/internalEmbeddings.ts";

<CodeBlock language="typescript">{ExampleInternalEmbeddings}</CodeBlock>

## Related

- Vector store [conceptual guide](/docs/concepts/#vectorstores)
Expand Down
89 changes: 57 additions & 32 deletions examples/src/indexes/vector_stores/hana_vector/advancedFiltering.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ await new Promise<void>((resolve, reject) => {
const docs: Document[] = [
{
pageContent: "First",
metadata: { name: "adam", is_active: true, id: 1, height: 10.0 },
metadata: { name: "Adam Smith", is_active: true, id: 1, height: 10.0 },
},
{
pageContent: "Second",
metadata: { name: "bob", is_active: false, id: 2, height: 5.7 },
metadata: { name: "Bob Johnson", is_active: false, id: 2, height: 5.7 },
},
{
pageContent: "Third",
metadata: { name: "jane", is_active: true, id: 3, height: 2.4 },
metadata: { name: "Jane Doe", is_active: true, id: 3, height: 2.4 },
},
];

Expand Down Expand Up @@ -75,8 +75,8 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$ne":1}}
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 }
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 }
*/

// Between range
Expand All @@ -86,27 +86,27 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$between":[1,2]}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

// In list
advancedFilter = { name: { $in: ["adam", "bob"] } };
advancedFilter = { name: { $in: ["Adam Smith", "Bob Johnson"] } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$in":["adam","bob"]}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */
/* Filter: {"name":{"$in":["Adam Smith","Bob Johnson"]}}
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

// Not in list
advancedFilter = { name: { $nin: ["adam", "bob"] } };
advancedFilter = { name: { $nin: ["Adam Smith", "Bob Johnson"] } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$nin":["adam","bob"]}}
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */
/* Filter: {"name":{"$nin":["Adam Smith","Bob Johnson"]}}
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 } */

// Greater than
advancedFilter = { id: { $gt: 1 } };
Expand All @@ -115,8 +115,8 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$gt":1}}
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 }
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 } */

// Greater than or equal to
advancedFilter = { id: { $gte: 1 } };
Expand All @@ -125,9 +125,9 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$gte":1}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 }
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 } */

// Less than
advancedFilter = { id: { $lt: 1 } };
Expand All @@ -145,7 +145,7 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"id":{"$lte":1}}
{ name: 'adam', is_active: true, id: 1, height: 10 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 } */

// Text filtering with $like
advancedFilter = { name: { $like: "a%" } };
Expand All @@ -154,26 +154,43 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$like":"a%"}}
{ name: 'adam', is_active: true, id: 1, height: 10 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 } */

advancedFilter = { name: { $like: "%a%" } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$like":"%a%"}}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 } */

// Text filtering with $contains
advancedFilter = { name: { $contains: "bob" } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$contains":"bob"}}
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

advancedFilter = { name: { $contains: "bo" } };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"name":{"$contains":"bo"}}
<empty result> */

// Combined filtering with $or
advancedFilter = { $or: [{ id: 1 }, { name: "bob" }] };
advancedFilter = { $or: [{ id: 1 }, { name: "Bob Johnson" }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$or":[{"id":1},{"name":"bob"}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */
/* Filter: {"$or":[{"id":1},{"name":"Bob Johnson"}]}
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

// Combined filtering with $and
advancedFilter = { $and: [{ id: 1 }, { id: 2 }] };
Expand All @@ -184,15 +201,23 @@ printFilterResult(
/* Filter: {"$and":[{"id":1},{"id":2}]}
<empty result> */

advancedFilter = { $and: [{ name: { $contains: "bob" } }, { id: 2 }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$and":[{"name":{"$contains":"bob"}},{"id":2}]}
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

advancedFilter = { $or: [{ id: 1 }, { id: 2 }, { id: 3 }] };
console.log(`Filter: ${JSON.stringify(advancedFilter)}`);
printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$or":[{"id":1},{"id":2},{"id":3}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 }
{ name: 'jane', is_active: true, id: 3, height: 2.4 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 }
{ name: 'Jane Doe', is_active: true, id: 3, height: 2.4 } */

// You can also define a nested filter with $and and $or.
advancedFilter = {
Expand All @@ -203,8 +228,8 @@ printFilterResult(
await vectorStore.similaritySearch("just testing", 5, advancedFilter)
);
/* Filter: {"$and":[{"$or":[{"id":1},{"id":2}]},{"height":{"$gte":5.0}}]}
{ name: 'adam', is_active: true, id: 1, height: 10 }
{ name: 'bob', is_active: false, id: 2, height: 5.7 } */
{ name: 'Adam Smith', is_active: true, id: 1, height: 10 }
{ name: 'Bob Johnson', is_active: false, id: 2, height: 5.7 } */

// Disconnect from SAP HANA aft er the operations
client.disconnect();
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import { Document } from "@langchain/core/documents";
import hanaClient from "hdb";
import { HanaInternalEmbeddings } from "@langchain/community/embeddings/hana_internal";
import { HanaDB, HanaDBArgs } from "@langchain/community/vectorstores/hanavector";

// Initialize the internal embeddings instance using the internal model ID.
// This instance will use SAP HANA's built-in VECTOR_EMBEDDING function of HanaDB.
const internalEmbeddings = new HanaInternalEmbeddings({
internalEmbeddingModelId: process.env.HANA_DB_EMBEDDING_MODEL_ID || "SAP_NEB.20240715",
});

// Set up connection parameters from environment variables.
const connectionParams = {
host: process.env.HANA_HOST,
port: process.env.HANA_PORT,
user: process.env.HANA_UID,
password: process.env.HANA_PWD,
};

// Create a HANA client.
const client = hanaClient.createClient(connectionParams);

// Connect to SAP HANA.
await new Promise<void>((resolve, reject) => {
client.connect((err: Error) => {
if (err) {
reject(err);
} else {
console.log("Connected to SAP HANA successfully.");
resolve();
}
});
});

// Define the arguments for the vector store instance.
const args: HanaDBArgs = {
connection: client,
tableName: "testInternalEmbeddings",
};

// Create a new HanaDB vector store using the internal embeddings instance.
// This vector store leverages the internal VECTOR_EMBEDDING function of HanaDB.
const vectorStore = new HanaDB(internalEmbeddings, args);
// Initialize the vector store (creates the table and verifies its columns).
await vectorStore.initialize();

// Example documents to index.
const docs: Document[] = [
new Document({
pageContent: "Charlie is a data scientist who specializes in AI research.",
metadata: { name: "Charlie Brown" },
}),
new Document({
pageContent: "David is a teacher with a passion for history and literature.",
metadata: { name: "David Williams" },
}),
new Document({
pageContent: "Eve is an entrepreneur focusing on blockchain and cryptocurrency.",
metadata: { name: "Eve Adams" },
}),
];

// Clean up any existing documents in the table.
await vectorStore.delete({ filter: {} });
// Add the example documents.
await vectorStore.addDocuments(docs);

// Perform a similarity search. In this example, we search for documents related to "bitcoin".
const results = await vectorStore.similaritySearch("bitcoin", 1);
console.log("Similarity search results:", results);
/*
[
{
pageContent: 'Eve is an entrepreneur focusing on blockchain and cryptocurrency.',
metadata: { name: 'Eve Adams' }
}
]
*/

// Disconnect from SAP HANA after operations.
client.disconnect();
4 changes: 4 additions & 0 deletions libs/langchain-community/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@ embeddings/gradient_ai.cjs
embeddings/gradient_ai.js
embeddings/gradient_ai.d.ts
embeddings/gradient_ai.d.cts
embeddings/hana_internal.cjs
embeddings/hana_internal.js
embeddings/hana_internal.d.ts
embeddings/hana_internal.d.cts
embeddings/hf.cjs
embeddings/hf.js
embeddings/hf.d.ts
Expand Down
1 change: 1 addition & 0 deletions libs/langchain-community/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ export const config = {
"embeddings/deepinfra": "embeddings/deepinfra",
"embeddings/fireworks": "embeddings/fireworks",
"embeddings/gradient_ai": "embeddings/gradient_ai",
"embeddings/hana_internal": "embeddings/hana_internal",
"embeddings/hf": "embeddings/hf",
"embeddings/hf_transformers": "embeddings/hf_transformers",
"embeddings/huggingface_transformers":
Expand Down
13 changes: 13 additions & 0 deletions libs/langchain-community/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,15 @@
"import": "./embeddings/gradient_ai.js",
"require": "./embeddings/gradient_ai.cjs"
},
"./embeddings/hana_internal": {
"types": {
"import": "./embeddings/hana_internal.d.ts",
"require": "./embeddings/hana_internal.d.cts",
"default": "./embeddings/hana_internal.d.ts"
},
"import": "./embeddings/hana_internal.js",
"require": "./embeddings/hana_internal.cjs"
},
"./embeddings/hf": {
"types": {
"import": "./embeddings/hf.d.ts",
Expand Down Expand Up @@ -3469,6 +3478,10 @@
"embeddings/gradient_ai.js",
"embeddings/gradient_ai.d.ts",
"embeddings/gradient_ai.d.cts",
"embeddings/hana_internal.cjs",
"embeddings/hana_internal.js",
"embeddings/hana_internal.d.ts",
"embeddings/hana_internal.d.cts",
"embeddings/hf.cjs",
"embeddings/hf.js",
"embeddings/hf.d.ts",
Expand Down
Loading