Skip to content

[EIS] Dense Text Embedding task type integration #129847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 24, 2025

Conversation

timgrein
Copy link
Contributor

@timgrein timgrein commented Jun 23, 2025

This PR adds the text_embedding task type to the elastic inference provider.

Testing flow:

  • Prerequisites:
    • EIS is setup locally to allow multilingual-embed-v1
    • EIS returns a dummy response for dense text embeddings

Verifying, that the default endpoint exists:

curl --location 'http://localhost:9200/_inference?pretty=null' \
--header 'Authorization: {BASIC_AUTH}'

{
    "endpoints": [
        ...
        {
            "inference_id": ".multilingual-embed-v1-elastic",
            "task_type": "text_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "multilingual-embed-v1",
                "rate_limit": {
                    "requests_per_minute": 10000
                }
            }
        },
        ...

Generating embeddings using the default endpoint:

curl --location 'http://localhost:9200/_inference/text_embedding/.multilingual-embed-v1-elastic' \
--header 'Content-Type: application/json' \
--header 'Authorization: {BASIC_AUTH}' \
--data '{
    "input": "A blue sky"
}'

{
    "text_embedding": [
        {
            "embedding": [
                2.1259406,
                1.7073475,
                ...
            ]
        }
    ]
}

timgrein added 4 commits June 23, 2025 13:19
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceGetServicesIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/action/ElasticInferenceServiceActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/action/ElasticInferenceServiceActionVisitor.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I left a few suggestions.

public static TextEmbeddingFloatResults fromResponse(Request request, HttpResult response) throws IOException {
var parserConfig = XContentParserConfiguration.EMPTY.withDeprecationHandler(LoggingDeprecationHandler.INSTANCE);

try (XContentParser jsonParser = XContentFactory.xContent(XContentType.JSON).createParser(parserConfig, response.body())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElasticInferenceServiceRateLimitServiceSettings {

public static final String NAME = "elastic_inference_service_dense_embeddings_service_settings";
static final String DIMENSIONS_SET_BY_USER = "dimensions_set_by_user";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this field is important if the request to EIS will include a field called dimensions or someway to telling EIS to number of dimensions to return in the response. I don't see a field being sent to EIS. Or did I miss it?

The reason we rely on a "set by user" is because it helps determining whether we automatically figured out the number of dimensions or if we took the user's value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not there yet as we're not 100% sure, which model we're going to host, so this includes some guesswork. I thought it would be better to have it or would you suggest to remove it and add it in a patch version, if we really need it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I think I'd remove it for now because we could get in a weird state where if we release the code as it is right now and the user specifies the dimensions field, we'll set dimensions_set_by_user to true but behind the scenes we're going to auto set it after the fact. So the actual state will not be accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// optional field
if ((usageContext == ElasticInferenceServiceUsageContext.UNSPECIFIED) == false) {
builder.field(USAGE_CONTEXT, usageContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If usageContext is null I believe this if-block will return true. Is that what we want? I think we could also do != right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, adjusted with Do not set usage context, if it's null

* 0.9020516
* ],
* (...)
* ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remembered Tim's thread on this a couple weeks ago, but should we revisit the response format? Looking at OpenAI, Alibaba, and Mixedbread as quick references, it looks like they return a list of objects. I don't have a strong preference, but just wanted to bring this up since we might be differing from others here and wanted to confirm that this is what we want.
Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in the thread

@timgrein timgrein marked this pull request as ready for review June 24, 2025 07:42
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 24, 2025
@timgrein timgrein added >non-issue :ml Machine learning Team:ML Meta label for the ML team v8.19.0 auto-backport Automatically create backport pull requests when merged and removed needs:triage Requires assignment of a team area label labels Jun 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@brendan-jugan-elastic brendan-jugan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One small question

ElasticInferenceService.NAME,
ElasticInferenceService.DENSE_TEXT_EMBEDDINGS_DIMENSIONS,
ElasticInferenceService.defaultDenseTextEmbeddingsSimilarity(),
DenseVectorFieldMapper.ElementType.FLOAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but can you explain why the MinimalServiceSettings differ from other task types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just about the different purposes models/tasks:

  • Dense vector embeddings can have different element types (typically float, but they can also be quantized to bit vectors or int vectors for example) , therefore we need to specify the ElementType. Some models also allow you to specify a target number of dimensions (f.e. when using Matryoshka embeddings, therefore we need to specify the number of dimensions. Also vector embeddings can be compared using different similarity measures, therefore we need to specify the similarity measure.
  • A reranking model simply returns an ordered list of ranked documents, so it doesn't make sense to specify dimensions, an element type or a similarity measure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thanks for the background :)

@timgrein timgrein merged commit 3b51dd5 into elastic:main Jun 24, 2025
32 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129847

timgrein added a commit to timgrein/elasticsearch that referenced this pull request Jun 24, 2025
(cherry picked from commit 3b51dd5)

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceGetModelsWithElasticInferenceServiceIT.java
#	x-pack/plugin/inference/qa/inference-service-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/MockElasticInferenceServiceAuthorizationServer.java
#	x-pack/plugin/inference/src/internalClusterTest/java/org/elasticsearch/xpack/inference/integration/InferenceRevokeDefaultEndpointsIT.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/action/ElasticInferenceServiceActionVisitor.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/response/ElasticInferenceServiceAuthorizationResponseEntity.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java
@timgrein
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants