Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 2.39 KB

File metadata and controls

48 lines (36 loc) · 2.39 KB

Configuration

← Back to README

Environment Variables

Note: All variables below are optional. If not set, the server will use the default values shown.

Variable Description Default Value
MCP_KEYPHRASES_EMBEDDINGS_MODEL BERT embeddings model paraphrase-multilingual-MiniLM-L12-v2
MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL spaCy tokenizer model en_core_web_trf
MCP_KEYPHRASES_LOG_LEVEL Log lever INFO
MCP_KEYPHRASES_MAX_TEXT_LEN Maximal length of the input text in characters 6000
MCP_KEYPHRASES_MAX_KEYPHRASES_COUNT Maximal number of keyphrases to extract elastic
PORT HTTP port for MCP server apple_health_data

There are various pretrained embedding models for BERT. The "paraphrase-multilingual-MiniLM-L12-v2" for multi-lingual documents or any other language that is used by default. You can specify "all-MiniLM-L6-v2" model for English documents.

The are various spaCy pretrained models.

If you want to build docker image with custom models, then provide MCP_KEYPHRASES_EMBEDDINGS_MODEL and MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL variables both as arguments to build the docker image:

docker build \
    --build-arg MCP_KEYPHRASES_EMBEDDINGS_MODEL="<selecled_embeddings_model>" \
    --build-arg MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL="<selected_spacy_model>" \
    -t mcp-keyphrases:latest .

and as environment variables to run the container:

docker run 
    --rm \
    --name keyphrases-mcp-server \
    -i \
    -v <path_to_documents>:/app/documents \
    -p 8000:8000 \
    --gpus all \
    -e MCP_KEYPHRASES_EMBEDDINGS_MODEL="<selecled_embeddings_model>" \
    -e MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL="<selected_spacy_model>" \
    mcp-keyphrases:latest