Skip to content

Handle legacy mappings with placeholder fields #85059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 26, 2022

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Mar 17, 2022

As part of #81210 we would like to add support for handling legacy (Elasticsearch 5 and 6) mappings in newer Elasticsearch versions. The idea is to import old mappings "as-is" into Elasticsearch 8, and adapt the mapper parsers so that they can handle those old mappings. Only a select subset of the legacy mapping will actually be parsed, and fields that are neither known to newer ES version nor supported for search will be mapped as "placeholder fields", i.e., they are still represented as fields in the system so that they can give proper error messages when queried by a user.

Fields that are supported:

  • field data types that support doc values only fields
    • normalizer on keyword fields and date formats on date fields are on supported in so far as they behave similarly across versions. In case they are not, these fields are now updateable on legacy indices so that they can be "fixed" by user.
  • object fields
  • nested fields in limited form (not supporting nested queries)
    • add tests / checks in follow-up PR
  • multi fields
  • field aliases
  • metadata fields
  • runtime fields (auto-import to be added for future versions)

5.x indices with mappings that have multiple mapping types are collapsed together on a best-effort basis before they are imported.

Relates #81210

@ywelsch ywelsch added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types labels Mar 28, 2022
@ywelsch ywelsch marked this pull request as ready for review March 28, 2022 08:20
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 28, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the Mapper changes now, I think. Can we have some tests checking serialization of placeholder mappers with unknown fields?

@@ -227,7 +227,8 @@ public String toString() {

public static final TypeParser PARSER = new TypeParser(
(n, c) -> new Builder(n, c.indexVersionCreated()),
notInMultiFields(CONTENT_TYPE)
notInMultiFields(CONTENT_TYPE),
Version.CURRENT.minimumCompatibilityVersion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there are a few of these multi-parameter calls, maybe we should add a super constructor that takes the Builder and verifier functions and passes Version.CURRENT.minimumCompatibilityVersion() as a default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed in 07234b1

mappingsBuilder.startObject("date").field("type", "date").field("format", "yyyy/MM/dd").endObject();
mappingsBuilder.endObject().endObject();
putMappingsRequest.setJsonEntity(Strings.toString(mappingsBuilder));
assertOK(client().performRequest(putMappingsRequest));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why this needs to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now automated by the mapping conversion. Prior to this PR, you had to manually define the mapping after importing a legacy index in order to access the fields in that index (the original mapping would only be imported to _meta/legacy-mappings section).

@ywelsch
Copy link
Contributor Author

ywelsch commented Apr 25, 2022

I'm happy with the Mapper changes now, I think. Can we have some tests checking serialization of placeholder mappers with unknown fields?

I've added such a test in ad99173 in addition to the ones in OldMappingsIT

@ywelsch ywelsch requested a review from romseygeek April 25, 2022 09:20
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of nits, but LGTM otherwise. Thanks for your patience on this @ywelsch!

"include_in_all",
"[include_in_all] is deprecated, the _all field have been removed in this version"
);
if (parserContext.indexVersionCreated().isLegacyIndexVersion() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to guard this with a version check? I think it will still be true even for older mappings, in that we won't generate an _all field type that you can search against even if the underlying lucene field exists in the index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. In an earlier iteration, I was pondering on adding _all support for legacy indices, hence the version check. I've removed it now.

@ywelsch
Copy link
Contributor Author

ywelsch commented Apr 25, 2022

@elasticmachine run elasticsearch-ci/part-1 (unrelated failure)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nits and some questions, no blockers on my end. LGTM, thanks for all the iterations.

*/
private void checkMappingsCompatibility(IndexMetadata indexMetadata) {
@Nullable
public Mapping checkMappingsCompatibility(IndexMetadata indexMetadata) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find it slightly confusing that this check method is now returning the mappings. I can see how it is convenient to return the mappings obtained from the merge operation. Would it make sense renaming the method to adapt it to the updated behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 07a2602

Map<String, MetadataFieldMapper.TypeParser> metadata5x = new LinkedHashMap<>(metadata7x);
metadata5x.put(LegacyTypeFieldMapper.NAME, LegacyTypeFieldMapper.PARSER);
this.metadataMapperParsers5x = metadata5x;
this.fieldFilter = fieldFilter;
}

/**
* Return a map of the mappers that have been registered. The
* Return a map of the non-legacy mappers that have been registered. The
* returned map uses the type of the field as a key.
*/
public Map<String, Mapper.TypeParser> getMapperParsers() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this method is only used in tests now: is that ok or do we need to adapt tests as a follow-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adapted the tests and removed it in 4c86d3c

* so that the original mapping can be preserved and proper exception messages can
* be provided when accessing these fields.
*/
public class PlaceHolderFieldMapper extends FieldMapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to get confused with naming here, and maybe it is just me: would a different name like UnknownLegacyFieldMapper better represent what this mapper is used for? Shall we make it final also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the PlaceHolder terminology, as we might know the field type (e.g. it's a completion field, or a search_as_you_type field, which the current version/implementation knows about), but we just not chose to support for legacy indices. I'm open for changing it if we find a better name, but also happy with the current one.
Regarding "final" modifier, it's more of a personal preference, but I like to avoid adding too many restrictive modifiers in our codebase, as we're not developing a library (unlike Lucene), and it feels unnecessary doing so (putting extra focus on things that should not matter).

this(builderFunction, contextValidator, Version.CURRENT.minimumIndexCompatibilityVersion());
}

public TypeParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one could be made private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in f163412

this(builderFunction, (n, c) -> {}, Version.CURRENT.minimumIndexCompatibilityVersion());
}

public TypeParser(BiFunction<String, MappingParserContext, Builder> builderFunction, Version minimumCompatibilityVersion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help to add javadocs here to clarify when to use which constructor? as far as I understand, the constructor that takes the version is the one called by all the mappers that are supported on legacy indices, all the rest remains the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in f163412

@@ -1352,6 +1358,10 @@ public final void parse(String name, MappingParserContext parserContext, Map<Str
validate();
}

protected void handleUnknownParam(String propName, Object propNode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename it to something that removes any doubt that this is only called for legacy indices? handleLegacyindexUnknownParam ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the method to handleUnknownParamOnLegacyIndex in f163412


@Override
public boolean supportsVersion(Version indexCreatedVersion) {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this I find a bit surprising especially given that field aliases were added with 6.4 . I wonder if this is never called, in which case maybe we should throw exception instead, or if it could rely on the default impl instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though field aliases were only added in ES 6.4, they could be used with older indices as well (i.e. you had a 5.x index for example that you kept using in 6.4 and now added the field alias to the mapping of that 5.x index). As field aliases are more of a runtime property (e.g. just like runtime fields, which you could also use on indices created before the release of runtime fields) and have no connection to the actual data, I see no reason to limit their use.


@Override
public boolean supportsVersion(Version indexCreatedVersion) {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if overriding these two supportsVersion is needed. Maybe it does because metadata fields are always supported and they don't take a version in like FieldMapper does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never called, it mainly resulted from the fact that MetadataFieldMapper.TypeParser was extending Mapper.TypeParser (with no good reason). I separated the two in b25369e, which means we no longer have to have this method here.


@Override
public boolean supportsVersion(Version indexCreatedVersion) {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, I wonder what this means. Does this get called, or does this simply mean "objects are always supported regardless of the version"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter (objects are always supported regardless of the version). This gets called indeed, e.g. when you have an object below an object.

@ywelsch ywelsch merged commit 4e41c5f into elastic:master Apr 26, 2022
@ywelsch
Copy link
Contributor Author

ywelsch commented Apr 26, 2022

Thank you for the reviews @romseygeek and @javanna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants