Change directory structure to a directory per container manager #1

iemejia · 2017-06-04T07:11:06Z

I hadn't seen you have created a new repo for the examples. Nice.
Here is my first contribution, feel free to accept or not.

iemejia · 2017-06-04T07:11:30Z

R: @patricklucas take a look.

iemejia · 2017-06-04T13:51:54Z

Patrick if you agree we can work now on making flink also an official chart of kubernetes :)
https://github.com/kubernetes/charts
I mean you have done almost everything so it is totally your credit, but don't hesitate to tell me if I can help with something.

patricklucas

Sorry for the long delay here, forgot to follow up on this after my vacation.

patricklucas · 2017-08-09T09:57:51Z

README.md

@@ -8,7 +8,7 @@ Examples for how to use the Flink Docker images in a variety of ways.
 Docker Compose
 --------------

-Use the [Docker Compose config](docker-compose.yml) in this repo to create a local Flink cluster.
+Use the [Docker Compose config](docker/docker-compose.yml) in this repo to create a local Flink cluster.


Unfortunately there is a reference to this file in Flink's docs so we need to update that and probably maintain a symlink here until the next release.

Vince-Cercury · 2018-04-16T03:47:45Z

any update on "making flink also an official chart of kubernetes"?

Is there someone maintaining this? I've been using Flink with our own kubernetes resouces files, based on the documentation. I would prefer using a HELM chart. Trying this chart to set it up HA. No luck yet

iemejia · 2018-04-16T13:10:04Z

Excellent question @Vincemd. I would really love to work on this now. @patricklucas any chance we can do this with your code?

Vince-Cercury · 2018-04-17T22:56:54Z

@iemejia I now understand better the HA setup. I could not make it work with this Helm (see #4) but I think I got it working with my own manifest files and docker images.

Also consider that document https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 which outlines future changes to Flink to make it leverage the power containers and provisioning job managers and task manager just and only when needed.

With current versions, there are two main ways to set it up HA with Kubernetes and Zookeeper:

Two or more active job managers and a zookeeper cluster.
This replicates what you would do in a more traditional infrastructure. As shown in the diagram https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/jobmanager_high_availability.html
If the first jobmanager is not responsive, a new leader is elected. The Flink client and tasks managers are communicating to Zookeeper to find out the details of the new leader (they don't use Job manager RPC address, they first go through Zookeeper, which tells them what is the RPC address of the Flink job manager leader. You need two or more Kubernetes services. One for each job manager. We don't want to use a single service with load balancing. That won't work with Fink.
A single job manager and a zookeeper cluster.
The job manager is stateless, so if Kubernetes is able to detect quickly that the Job manager is not healthy (LivelinessProb is critical here), it will kill it and start a new one. The new one will connect to Zookeeper, complain that there is no Leader for a while and that even though Job manager is the same hostname, the leader ID stored in Zookeeper is not the same as the new ID just generated. But it will eventually become the new leader and try to recover jobs.

Benefits of option 1: you don't need to wait for the container to be restarted. It should quickly pivot to the other job manager
Cons of option 1: two job managers consume more resources. One of them is doing nothing. You have to setup two Kubernetes services, one for each job managers. You need to inject the full hostname of the job managers in conf/masters file in each container so that the UI can redirect you to the right URL of the leader job manager. So basically we can't use a single service and 2 pods behind the service. It has to be 2 services and 2 pods. Zookeeper sends you to one or the other. There is not actual load balancer. The clients as well as the tasks manager need to be talking to the right leader, via the right exposed service.

Benefit of option 2: cheaper to run, simpler to configure (only 1 k8s service).
Cons of option 2: without a well defined Liveliness Prob for the job manager (I don't know yet if there is a health check service available for job manager), it may remain in an unhealthy state. So it will never be restarted by Kubernetes. Another issue, even if job manager is indeed restarted, it may take time to pull the container image or something can happen during the boot process. It could further delay the job recovery or might never happen if the pod fail to boot for any reason. Having active/active with leader election in option 1, somewhat mitigates that risk.

I think, after writing this, I'm inclined to use option 2 for production. Although we could have a helm chart that offers both options.
We can set up option 1 by creating first job manager deployment and first service (jobmanager-deployment-1 and jobmanager-svc-1). If option 2 is desired, just need to apply jobmanager-deployment-2 and jobmanager-svc-2 which will automatically register itself to the zookeeper cluster and become the active backup.

iemejia · 2018-04-19T07:26:11Z

@Vincemd will you be up to contribute actively to work on a helm chart ? We can then ask the community and try to make it an official helm chart as we did with the docker image. We can setup a repo for this here and start working on this based on @patricklucas's implementation.

Vince-Cercury · 2018-04-19T08:49:58Z

Yes I would need some help with #4
I'm planning on using a HELM chart for FLink and Zookeeper soon.
I'm not expert with Flink or HELM but would like to see a community supported project.

We need to be mindful of FLIP-6 so to not do too much work for nothing if the design changes considerably.

Vince-Cercury · 2018-04-23T08:03:32Z

After further testing I believe a single Job manager is sufficient in HA mode. I feel more confident it works with the LivenessProb #8

patricklucas · 2018-04-24T10:11:09Z

Hi folks, thanks for the continued discussion about this. The last few months have been pretty crazy and I let this work fall by the wayside.

Regarding HA-mode: the general consensus has been that the best approach to Flink HA-mode when using a resource manager is to use a single jobmanager. Granted, as you've noticed, this leaves a few things unanswered, such as the best way to do a liveness probe, but I think what you have in #8 is a decent start.

As far as this PR, @iemejia, if you fix up the conflicts and address the one comment I made (just replace that file you moved with a link to the new location) I think we can merge this and move on.

Vince-Cercury · 2018-04-24T22:39:03Z

@patricklucas agree. After implementing the liveness prob, I also decided to use a single Jobmanager.

iemejia · 2018-04-26T16:59:57Z

Sure, I am pretty busy at this moment but I will try to work on this next week and ping you guys

iemejia changed the title ~~Change directory strategy for each given container manager~~ Change directory structure to a directory per container manager Jun 4, 2017

Change directory structure to a directory per container manager

058ae1c

patricklucas reviewed Aug 9, 2017

View reviewed changes

iemejia closed this Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change directory structure to a directory per container manager #1

Change directory structure to a directory per container manager #1

iemejia commented Jun 4, 2017

iemejia commented Jun 4, 2017

iemejia commented Jun 4, 2017 •

edited

Loading

patricklucas left a comment

patricklucas Aug 9, 2017

Vince-Cercury commented Apr 16, 2018

iemejia commented Apr 16, 2018

Vince-Cercury commented Apr 17, 2018 •

edited

Loading

iemejia commented Apr 19, 2018

Vince-Cercury commented Apr 19, 2018

Vince-Cercury commented Apr 23, 2018

patricklucas commented Apr 24, 2018

Vince-Cercury commented Apr 24, 2018

iemejia commented Apr 26, 2018

Change directory structure to a directory per container manager #1

Change directory structure to a directory per container manager #1

Conversation

iemejia commented Jun 4, 2017

iemejia commented Jun 4, 2017

iemejia commented Jun 4, 2017 • edited Loading

patricklucas left a comment

Choose a reason for hiding this comment

patricklucas Aug 9, 2017

Choose a reason for hiding this comment

Vince-Cercury commented Apr 16, 2018

iemejia commented Apr 16, 2018

Vince-Cercury commented Apr 17, 2018 • edited Loading

iemejia commented Apr 19, 2018

Vince-Cercury commented Apr 19, 2018

Vince-Cercury commented Apr 23, 2018

patricklucas commented Apr 24, 2018

Vince-Cercury commented Apr 24, 2018

iemejia commented Apr 26, 2018

iemejia commented Jun 4, 2017 •

edited

Loading

Vince-Cercury commented Apr 17, 2018 •

edited

Loading