-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[EIS] Dense Text Embedding task type integration #129847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f054dca
9ca7369
6584dab
9d47176
3e8c70a
23e7595
5af7516
fddfd9d
9b48dfb
dbdadbe
e2f872e
485dd89
172070a
6a35870
a8b604b
3b486b7
6ffcc22
3489a09
fb5dbc0
1dcbcab
dc6f320
087d4e5
cd3e116
aa24341
7269c51
220e208
27ca440
b7d10b8
3164c6c
fc11815
59f84a9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.inference.external.response.elastic; | ||
|
||
import org.elasticsearch.common.xcontent.XContentParserUtils; | ||
import org.elasticsearch.xcontent.ConstructingObjectParser; | ||
import org.elasticsearch.xcontent.ParseField; | ||
import org.elasticsearch.xcontent.XContentFactory; | ||
import org.elasticsearch.xcontent.XContentParserConfiguration; | ||
import org.elasticsearch.xcontent.XContentType; | ||
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults; | ||
import org.elasticsearch.xpack.inference.external.http.HttpResult; | ||
import org.elasticsearch.xpack.inference.external.request.Request; | ||
|
||
import java.io.IOException; | ||
import java.util.List; | ||
|
||
import static org.elasticsearch.xcontent.ConstructingObjectParser.constructorArg; | ||
|
||
public class ElasticInferenceServiceDenseTextEmbeddingsResponseEntity { | ||
|
||
/** | ||
* Parses the Elastic Inference Service Dense Text Embeddings response. | ||
* | ||
* For a request like: | ||
* | ||
* <pre> | ||
* <code> | ||
* { | ||
* "inputs": ["Embed this text", "Embed this text, too"] | ||
* } | ||
* </code> | ||
* </pre> | ||
* | ||
* The response would look like: | ||
* | ||
* <pre> | ||
* <code> | ||
* { | ||
* "data": [ | ||
* [ | ||
* 2.1259406, | ||
* 1.7073475, | ||
* 0.9020516 | ||
* ], | ||
* (...) | ||
* ], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I vaguely remembered Tim's thread on this a couple weeks ago, but should we revisit the response format? Looking at OpenAI, Alibaba, and Mixedbread as quick references, it looks like they return a list of objects. I don't have a strong preference, but just wanted to bring this up since we might be differing from others here and wanted to confirm that this is what we want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Answered in the thread |
||
* "meta": { | ||
* "usage": {...} | ||
* } | ||
* } | ||
* </code> | ||
* </pre> | ||
*/ | ||
public static TextEmbeddingFloatResults fromResponse(Request request, HttpResult response) throws IOException { | ||
try (var p = XContentFactory.xContent(XContentType.JSON).createParser(XContentParserConfiguration.EMPTY, response.body())) { | ||
return EmbeddingFloatResult.PARSER.apply(p, null).toTextEmbeddingFloatResults(); | ||
} | ||
} | ||
|
||
public record EmbeddingFloatResult(List<EmbeddingFloatResultEntry> embeddingResults) { | ||
@SuppressWarnings("unchecked") | ||
public static final ConstructingObjectParser<EmbeddingFloatResult, Void> PARSER = new ConstructingObjectParser<>( | ||
EmbeddingFloatResult.class.getSimpleName(), | ||
true, | ||
args -> new EmbeddingFloatResult((List<EmbeddingFloatResultEntry>) args[0]) | ||
); | ||
|
||
static { | ||
// Custom field declaration to handle array of arrays format | ||
PARSER.declareField(constructorArg(), (parser, context) -> { | ||
return XContentParserUtils.parseList(parser, (p, index) -> { | ||
List<Float> embedding = XContentParserUtils.parseList(p, (innerParser, innerIndex) -> innerParser.floatValue()); | ||
return EmbeddingFloatResultEntry.fromFloatArray(embedding); | ||
}); | ||
}, new ParseField("data"), org.elasticsearch.xcontent.ObjectParser.ValueType.OBJECT_ARRAY); | ||
} | ||
|
||
public TextEmbeddingFloatResults toTextEmbeddingFloatResults() { | ||
return new TextEmbeddingFloatResults( | ||
embeddingResults.stream().map(entry -> TextEmbeddingFloatResults.Embedding.of(entry.embedding)).toList() | ||
); | ||
} | ||
} | ||
|
||
/** | ||
* Represents a single embedding entry in the response. | ||
* For the Elastic Inference Service, each entry is just an array of floats (no wrapper object). | ||
* This is a simpler wrapper that just holds the float array. | ||
*/ | ||
public record EmbeddingFloatResultEntry(List<Float> embedding) { | ||
public static EmbeddingFloatResultEntry fromFloatArray(List<Float> floats) { | ||
return new EmbeddingFloatResultEntry(floats); | ||
} | ||
} | ||
|
||
private ElasticInferenceServiceDenseTextEmbeddingsResponseEntity() {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but can you explain why the
MinimalServiceSettings
differ from other task types?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just about the different purposes models/tasks:
ElementType
. Some models also allow you to specify a target number of dimensions (f.e. when using Matryoshka embeddings, therefore we need to specify the number of dimensions. Also vector embeddings can be compared using different similarity measures, therefore we need to specify the similarity measure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Thanks for the background :)