We need to re-look at liveness / readiness Probs logic of our pods #2992

t83714 · 2020-10-02T04:29:24Z

We need to re-look at liveness/readiness Probs logic of our pods

We have done some work around liveness/readiness Probs for zero-downtime deployment here: #1471

We need to re-look at it to make sure that, during k8s rolling update (particularly for registry API), if DB is not accessible, will liveness/readiness Probs reports this correctly.

t83714 · 2020-10-02T04:47:02Z

Registry seems not checking anything with database at all and just simply reply OK (for both liveness & readiness):

magda/magda-registry-api/src/main/scala/au/csiro/data61/magda/registry/Api.scala

Line 139 in f52fcc4

path("ready") { complete(ReadyStatus(true)) }

soyarsauce · 2020-10-02T04:52:05Z

:feature:

soyarsauce · 2020-10-02T04:53:56Z

At a minimum we should resolve this for these ones that utilise DB, namely authorization-api, content-api, registry-api

soyarsauce · 2020-10-02T04:56:21Z

correspondence utilises this well for a smtp dep (instead of db dep for the above)

https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/createApiRouter.ts#L56-L67
https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/test/createApiRouter.spec.ts#L113-L141

soyarsauce · 2020-10-07T00:40:24Z

turns out authorization & content are OK
registry PR at #2997

soyarsauce · 2020-10-27T07:24:18Z

Same problem in storage-api

https://github.com/magda-io/magda/blob/v0.0.58-rc.3/magda-storage-api/src/createApiRouter.ts#L31-L39

t83714 · 2020-11-03T00:17:42Z

Add #3024 as the blocker as, currently, there is a performance bottleneck that only registry-full pod serve the /api/v0/registry endpoint and we can't scale it up.

Our UI always sends read requests to /api/v0/registry-read-only but we can't guarantee that third-party software will do the same --- especially, for metadata crawlers.

Querying DB in readiness probe may add extra burden to it when registry-full is already on full speed.

t83714 added the bug label Oct 2, 2020

soyarsauce self-assigned this Oct 6, 2020

soyarsauce linked a pull request Oct 6, 2020 that will close this issue

Added basic readiness check for registry API #2997

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need to re-look at liveness / readiness Probs logic of our pods #2992

We need to re-look at liveness / readiness Probs logic of our pods #2992

t83714 commented Oct 2, 2020

t83714 commented Oct 2, 2020

soyarsauce commented Oct 2, 2020

soyarsauce commented Oct 2, 2020

soyarsauce commented Oct 2, 2020

soyarsauce commented Oct 7, 2020

soyarsauce commented Oct 27, 2020

t83714 commented Nov 3, 2020

We need to re-look at liveness / readiness Probs logic of our pods #2992

We need to re-look at liveness / readiness Probs logic of our pods #2992

Comments

t83714 commented Oct 2, 2020