Skip to content

Conversation

@Dustinturner44
Copy link

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

DaveCTurner and others added 30 commits July 28, 2022 01:05
…c#88867)

This fixes the version substitution in a couple of response examples in
the put DFA docs.
…astic#88877)

The value is `_now` and there was a previous metadata
value `_timestamp` (see test removal in elastic#88733) so the
name is confusing.

Also renames the method `getTimestamp()` to `getNow()`
to reflect the change.
Adds some docs giving more detailed background about what data
corruption really means and some suggestions about how to narrow down
the root cause.

Co-authored-by: Henning Andersen <[email protected]>
This PR documents the impact of domain splitting on API keys. API key
ownership is determined via username and user realm information,
including the user's security domain. API key ownership is shared
across users with the same username that are part of the same security
domain. A user loses ownership over an API key if their realm is
removed from the security domain that previously enabled ownership
through cross-realm resource sharing.
# Conflicts:
#	build-tools-internal/version.properties
elastic#88907) (elastic#88939)

When handling unicode accents, it may have been that BERT tokenizations removed the incorrect characters. This would result in an exceptionally strange result and possibly an error.

closes elastic#88900
… after being allocated to node (elastic#88945) (elastic#88992)

When a model is starting, it has been rarely observed that it will lock up while trying to restore the model objects to the native process.

This would manifest as a trained model being stuck in "starting" while also being assigned to a node. So, there is a native process started and task available on the assigned nodes, but the model state never gets out of "starting".
…lastic#88995)

This commit fixes the situation where a user wants to use CCR to replicate indices that are part of
a data stream while renaming the data stream. For example, assume a user has an auto-follow request
that looks like this:

```
PUT /_ccr/auto_follow/my-auto-follow-pattern
{
  "remote_cluster" : "other-cluster",
  "leader_index_patterns" : ["logs-*"],
  "follow_index_pattern" : "{{leader_index}}_copy"
}
```

And then the data stream `logs-mysql-error` was created, creating the backing index
`.ds-logs-mysql-error-2022-07-29-000001`.

Prior to this commit, replicating this data stream means that the backing index would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This
caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not
renamed for a data stream during following.

After this commit, there are a couple of changes:

First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes
`logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works,
this means we need to support renaming a data stream for a regular "create follower" request, so a
new parameter has been added: `data_stream_name`. It works like this:

```
PUT /mynewindex/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": "myotherindex",
  "data_stream_name": "new_ds"
}
```

Second, the backing index for a data stream must be renamed in a way that does not break the parsing
of a data stream backing pattern, whereas previously the index
`.ds-logs-mysql-error-2022-07-29-000001` would be renamed to
`.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the
rollover digit), after this commit it will be renamed to
`.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for
the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like:

| Leader Cluster | Follower Cluster |
|--------------|-----------|
| `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) |
| `.ds-logs-mysql-error-2022-07-29-000001`      | `.ds-logs-mysql-error_copy-2022-07-29-000001` |

Which internally means the auto-follow request turned into the create follower request of:

```
PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow
{
  "remote_cluster": "other-cluster",
  "leader_index": ".ds-logs-mysql-error-2022-07-29-000001",
  "data_stream_name": "logs-mysql-error_copy"
}
```

Relates to elastic#84940 (cherry-picked the commit for a test)
Relates to elastic#61993 (where data stream support was first introduced for CCR)
Resolves elastic#81751
Clean up network setting docs

- Add types for all params
- Remove mention of JDKs before 11
- Clarify some wording

Co-authored-by: Stef Nestor <[email protected]>

Co-authored-by: Stef Nestor <[email protected]>
* backporting elastic#88874

* Eliminating initial delay of CoordinationDiagnosticsService#beginPollingClusterFormationInfo for integration tests (elastic#89001)
…89011) (elastic#89042)

* [ML] fix NLP inference_config bwc serialization tests (elastic#89011)

The tests were failing because of span not being nulled out for question_answering and text_similarity tasks.

But, this change also attempts to make it more future proof so that if changes occur to the nlp task or tokenization configurations it will cause a failure more quickly and require handling the bwc testing.

closes: elastic#89008
(cherry picked from commit 480479d)

* fixing backport
…lastic#89030) (elastic#89060)

* [DOCS] Added note about using _size in Kibana. Closes elastic#88322

* Use correct attributes
…on (elastic#88855) (elastic#89068)

When for some reason ML nodes are replaced (cluster resize, upgrade, etc.),
it is possible that some models cannot be allocated at all. Then, while
the cluster is temporarily undersized, all cores are given for allocations
of the models that have survived. If those ML nodes return later, there may
be model deployments that were previously allocated that now do not get any
allocations. The reason is that our planner will try to preserve all current
allocations.

Operationally, this is not what serves best our users. Instead, as we are
already in a cluster that does not have enough resources to fully allocate
all model deployments, we should try to give at least one allocation to each
model that has previously been allocated.

In order to know a model has previously been allocated, this commit adds a field
to `TrainedModelAssignment` called `max_assigned_allocations` which records the
max number of allocations a deployment has received in its life. We can then use
this to establish whether a deployment has ever been allocated.

Finally, we modify the `AssignmentPlanner` so that after computing a plan we
check whether the plan gives at least one allocation to all previously allocated models.
If not, we then compute a plan that tries to give at least one allocation to each
previously allocated model. We can solve this just using bin-packing. Having that
plan we can invoke the planner one more time to optimize the rest of the allocations
whilst preserving the single allocations for previously allocated models.

Backport of elastic#88855
…9055 (elastic#89077)

This makes sure that the test cluster is stable in CoordinationDiagnosticsServiceIT::testBlockClusterStateProcessingOnOneNode before proceeding with the rest of test.
DaveCTurner and others added 22 commits September 13, 2022 00:24
…89978)

In elastic#62275 we refactored this code a bit and inadvertently reversed the
sense of this conditional when running in debug mode. This commit fixes
the mistake.

Co-authored-by: Elastic Machine <[email protected]>
elastic#89056) (elastic#90018)

This PR adds diagnosis logic to the shards availability health indicator that detects when a shard allocation 
is delayed. This usually happens when a node that a shard is allocated to disappears. It is often better to 
delay the recovery of a shard in case the node that hosts it comes back. Shards that are delayed in this 
manner have special flags set on their unassigned info that denote a delayed allocation.

This change adds a diagnosis to the indicator that identifies these delayed shards and provides guidance 
stating that they will eventually allocate on their own once the delay elapses, but if allocation is required 
immediately, an index setting can be updated to perform the allocation.

This PR also includes some light integration testing to ensure that more unassigned cases are covered by 
the indicator.
…) (elastic#90040)

Fixing the conditions that the health API uses to determine when to check with a master node for its view
of master history if the master appears to have gone null repeatedly.
…tic#90044)

* [DOCS] Update FIPS verbiage for the bundled JVM

* Fix links (this isn't Markdown)

Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
…stic#90079)

The file name should be role_mapping.yml instead of role_mappings.yml,
i.e. NOT plural.
…st (elastic#90075) (elastic#90106)

Make sure we don't accidentally create searchable snapshots while cleaning them.
Update docs for v8.4.2 release
This PR expands the approximate kNN docs to clarify the filter is applied during
the kNN search, not after. It explains the downsides of postfiltering.
Add dynamic changes for debug logging
@Dustinturner44 Dustinturner44 requested review from a team as code owners November 9, 2025 23:06
@cla-checker-service
Copy link

❌ Author of the following commits did not sign a Contributor Agreement:
f866662, 606dcbb, 947aaa9, 3922ff2, , 2bd229c, , c770958, 0b969c6, dc8abe1, , 76b93b0, 00d79a5

Please, read and sign the above mentioned agreement if you want to contribute to this project

@elasticsearchmachine elasticsearchmachine added v9.3.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 9, 2025
@breskeby
Copy link
Contributor

@Dustinturner44 can you elaborate what this PR is about? Please provide a meaningful description and rebase against latest main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.