Document a dynamically-scalable HA deployment #150

cmgrote · 2021-05-21T10:30:38Z

New connector OMAG pods that can come online (or be dropped) at any time
Some quorum mechanism across the pods so that one of the pods can be elected to periodically create an index checkpoint and store in some out-of-cluster location (e.g. S3)
Initial index store of each new OMAG pod taken from the latest such external checkpoint (see: https://opencrux.com/reference/21.04-1.16.0/checkpointing.html)
A readiness probe that would ideally only succeed once the pod's local index is up-to-date (not sure this would be feasible, as what would indicate it is up-to-date assuming there is always some activity happening via other pods (?))

This will be reliant on having a configuration mechanism for the OMAG platform itself that does not require configuration and / or startup via REST, as otherwise the readiness probe would have to be successful just to configure and startup the platform -- in which case it would already start receiving other traffic via a load-balancing service, all of which would fail prior to the connector being configured and started up (takes at least 20-30 seconds for an empty system, could be several minutes or longer if also bootstrapping its index). Having several minutes of "random" failures for requests that the load-balancer just happens to send to this bootstrapping pod would be unacceptable -- hence dependency on having a non-REST mechanism to start the pods, so readiness probe can indicate that the pod is truly ready to start receiving and (correctly) responding to requests.

cmgrote added the enhancement New feature or request label May 21, 2021

cmgrote self-assigned this May 21, 2021

This was referenced May 21, 2021

Create sample HA deployment #127

Closed

File-based OMAG Server (Platform) configuration odpi/egeria#5219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document a dynamically-scalable HA deployment #150

Document a dynamically-scalable HA deployment #150

cmgrote commented May 21, 2021

Document a dynamically-scalable HA deployment #150

Document a dynamically-scalable HA deployment #150

Comments

cmgrote commented May 21, 2021