Skip to content

Commit 6a51c19

Browse files
authored
Doc updates: LanceDB, HuggingFace and Marqo (#60)
docs and a couple small fixes
1 parent 90b60fc commit 6a51c19

File tree

5 files changed

+107
-2
lines changed

5 files changed

+107
-2
lines changed
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: 'HuggingFaceEmbed'
3+
description: 'The HuggingFaceEmbed connector is designed to generate embeddings for text data using Hugging Face hosted emedding models.
4+
---
5+
6+
Access the plethora of open-source models hosted through Hugging face using the `HuggingFaceEmbed` connector.
7+
8+
<Note>Looking for embedding models in HuggingFace? Here is a great place to start: [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)</Note>
9+
10+
## Properties
11+
12+
Required properties:
13+
- model: HuggingFace model ID or a URL to a deployed Inference Endpoint.
14+
- token: HuggingFace token.
15+
16+
<CodeGroup>
17+
```python Local Development
18+
from neumai.EmbedConnectors import HuggingFaceEmbed
19+
20+
azure_openai_embed = HuggingFaceEmbed(
21+
model = "model_id_or_url",
22+
token = "huggingface_token"
23+
)
24+
25+
```
26+
</CodeGroup>
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
3+
title: 'LanceDBSink'
4+
description: 'LanceDBSink enables seamless integration with LanceDB vector database, supporting vector storage and similarity search for advanced data retrieval.'
5+
6+
---
7+
8+
The `LanceDBSink` class is a connector for LanceDB, an open-source, serverless vector database built for seamless integration and scale.
9+
10+
## Properties
11+
12+
Required properties:
13+
14+
- uri: URI for the LanceDB database.
15+
- table_name: Name of the LanceDB table to be used.
16+
17+
Optional properties:
18+
19+
- api_key: If provided, connect to LanceDB cloud; otherwise, connect to a database on file system or cloud storage.
20+
region: Region for the use of LanceDB cloud.
21+
- create_index: Boolean to decide whether to create an index for ANN search or use flat search.
22+
- metric: The distance metric to use (default is 'cosine').
23+
- num_partitions: The number of partitions of the index.
24+
- num_sub_vectors: The number of sub-vectors created during Product Quantization (PQ).
25+
- accelerator: Specifies the accelerator to use for the index creation process (e.g., GPU or MPS).
26+
27+
<Note>Index creation is only required when dealing with 100k+ vectors. Below that threshold, set create_index to `false`. For more information on index creation and configuring partitions and sub vectors see: [LanceDB documentation](https://lancedb.github.io/lancedb/ann_indexes/#creating-an-ivf_pq-index)</Note>
28+
29+
<CodeGroup>
30+
```python Local Development
31+
from neumai.SinkConnectors import LanceDBSink
32+
33+
# Setup the LanceDBSink with required credentials and index information
34+
lancedb_sink = LanceDBSink(
35+
uri = "lancedb_uri",
36+
table_name = "test_table",
37+
# if using LanceDB Cloud add
38+
# api_key = "lancedb_cloud_api_key",
39+
# if using for more thatn 100k vectors then add:
40+
# create_index = True
41+
# ensure that your vector dimensions (ex. 1536 for OpenAI text-ada-002) is divisible by num_sub_vectors (default 96)
42+
# ensure that num_partitions is less than the number of vectors you are adding (default to 256)
43+
)
44+
```
45+
</CodeGroup>
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
3+
title: 'MarqoSink'
4+
description: 'MarqoSink enables seamless integration with Marqo's vector database, supporting vector storage and similarity search for advanced data retrieval.'
5+
6+
---
7+
8+
The `MarqoSink` class is a connector for Marqo, a vector database provider that enables multimodal vector search.
9+
10+
## Properties
11+
12+
Required properties:
13+
14+
- url: URL for accessing the Marqo service.
15+
- index_name: Name of the index in Marqo where the data will be stored.
16+
17+
Optional properties:
18+
19+
- api_key: API key required for authenticating with the Marqo cloud service.
20+
21+
<CodeGroup>
22+
```python Local Development
23+
from neumai.SinkConnectors import MarqoSink
24+
25+
# Setup the PineconeSink with required credentials and index information
26+
marqo_sink = MarqoSink(
27+
url = "marqo_url",
28+
index_name = "test_index",
29+
# If using Marqo cloud then add:
30+
# api_key = "marqo_cloud_api_key"
31+
)
32+
```
33+
</CodeGroup>

neumai/neumai/EmbedConnectors/HuggingfaceEmbedder.py renamed to neumai/neumai/EmbedConnectors/HuggingFaceEmbed.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ def embed_name(self) -> str:
1717

1818
@property
1919
def required_properties(self) -> List[str]:
20-
return ["token"]
20+
return ["model","token"]
2121

2222
@property
2323
def optional_properties(self) -> List[str]:

neumai/neumai/EmbedConnectors/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
from .AzureOpenAIEmbed import AzureOpenAIEmbed
33
from .EmbedConnector import EmbedConnector
44
from .OpenAIEmbed import OpenAIEmbed
5-
from .EmbedConnector import EmbedConnector
5+
from .EmbedConnector import EmbedConnector
6+
from .HuggingFaceEmbed import HuggingFaceEmbed

0 commit comments

Comments
 (0)