Note: All variables below are optional. If not set, the server will use the default values shown.
| Variable | Description | Default Value |
|---|---|---|
| MCP_KEYPHRASES_EMBEDDINGS_MODEL | BERT embeddings model | paraphrase-multilingual-MiniLM-L12-v2 |
| MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL | spaCy tokenizer model | en_core_web_trf |
| MCP_KEYPHRASES_LOG_LEVEL | Log lever | INFO |
| MCP_KEYPHRASES_MAX_TEXT_LEN | Maximal length of the input text in characters | 6000 |
| MCP_KEYPHRASES_MAX_KEYPHRASES_COUNT | Maximal number of keyphrases to extract | elastic |
| PORT | HTTP port for MCP server | apple_health_data |
There are various pretrained embedding models
for BERT. The "paraphrase-multilingual-MiniLM-L12-v2" for multi-lingual documents or any other language that is used by default. You can specify "all-MiniLM-L6-v2" model for English documents.
The are various spaCy pretrained models.
If you want to build docker image with custom models, then provide MCP_KEYPHRASES_EMBEDDINGS_MODEL and MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL variables both as arguments to build the docker image:
docker build \
--build-arg MCP_KEYPHRASES_EMBEDDINGS_MODEL="<selecled_embeddings_model>" \
--build-arg MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL="<selected_spacy_model>" \
-t mcp-keyphrases:latest .and as environment variables to run the container:
docker run
--rm \
--name keyphrases-mcp-server \
-i \
-v <path_to_documents>:/app/documents \
-p 8000:8000 \
--gpus all \
-e MCP_KEYPHRASES_EMBEDDINGS_MODEL="<selecled_embeddings_model>" \
-e MCP_KEYPHRASES_SPACY_TOKENIZER_MODEL="<selected_spacy_model>" \
mcp-keyphrases:latest