Add way to tokenize and segment words on outside of database

## Problem Statement


Can I use the hf model's tokenizer or other spliter to manually segment words outside or inside the database and insert them as the sparse vectors required for bm25 search? It can improve cjk language and  multi language document support and morden language dictionary 

## Proposed Solution

When building an index, there is an option to input additional sparse vector columns, such as manually segmenting them into sparse vectors externally and then inserting the tsvector column, or like the bm25vector column in the vchord_bm25 extension, or like generating dense vectors externally using an embedding model and then inserting the vector column, without using an internal index dictionary. 

```sql
-- Example usage (if applicable)
```

## Alternatives Considered

Using zhparser requires the installation of additional extensions and the creation of text search configuration within the database. The dictionary is also not modern enough


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add way to tokenize and segment words on outside of database #159

Problem Statement

Proposed Solution

Alternatives Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add way to tokenize and segment words on outside of database #159

Description

Problem Statement

Proposed Solution

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions