Skip to content

Set up the clusters

Chris edited this page Jun 24, 2021 · 28 revisions

Introduction

This will be a walkthrough on setting up everything you need to run Manticore in AWS. Many of these principles will be the same if you want to work with Manticore on your personal computer, or without HAProxy. We will be running one Consul server, one Nomad server, a Consul client, a Nomad client, a scheduled Manticore web application, HAProxy, and consul-template. See the previous section for a reference image.

Configuring the EC2 instance

In case you are unsure of a starting point, here are the following I've used for development:

  • I recommend using a t3.medium instance type for both machines. Also make sure they have the roles necessary to use part of the AWS API, such as for EC2
  • Both machines will have a custom security group with the following inbound rules:
    • SSH on port 22
    • Custom TCP on ports 8300-8302, 8500, 8600, 4646-4648, and 20000-32000 (Nomad allocates containers on these ports), and only traffic inside the network can communicate over these ports

For the machines with Manticore and HAProxy running, there are additional rules needed:

  • Custom TCP on port 80, and traffic from anywhere can access this port. This is how users access Manticore's web page.
  • Custom TCP on a port range that the machine does not use, and traffic from anywhere can access this port. It's the developer's responsibility to ensure that opening these ports are safe. This range will be used to open up ports for TCP connections from the SDL app to core. 10000-19999 is an example range.

Most of the ports are for Consul and Nomad to communicate with each other across machines in the same network. Additionally, the Manticore API server requires some resources for being able to manage many pairs of cores and HMIs. Therefore it is highly recommended to run Manticore on at least a t3.medium machine. Manticore may refuse to start if the hosted machine does not have sufficient memory, CPU, or network bandwidth. See here for what resources Manticore needs to run.

Installation

We will now set up Consul and Nomad on one of the machines. This will be the server agent machine. Both tools are developed by Hashicorp and so they have similar architecture.
They are a single binary file and you decide in which "mode" to start them up. You can find basic information about starting them up here and here but ideally we want 3 or 5 of these servers running that communicate with each other so that if one dies we don't lose availability.

We are going to start up only one Nomad and one Consul server. Don't do this in production.

Download the binaries here and here. Or, use a script:

CONSUL_VERSION=1.5.2
NOMAD_VERSION=0.9.3
#download consul and nomad and set them up
wget https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip
unzip consul_${CONSUL_VERSION}_linux_amd64.zip
rm consul_${CONSUL_VERSION}_linux_amd64.zip
sudo mv consul /bin/consul

wget https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip
unzip nomad_${NOMAD_VERSION}_linux_amd64.zip
rm nomad_${NOMAD_VERSION}_linux_amd64.zip
sudo mv nomad /bin/nomad

It would also help if you add the binaries to your path. Also, please use at a minimum the versions that have been tested to work with Manticore, which is Consul at least v1.5.2 and Nomad at least v0.9.3.

Start the Consul server agent

Now it's time to start the Consul server. Both the Consul server and the Nomad server will be running on the same machine in this walkthrough. Here is the magic command:
consul agent -server -data-dir="/tmp/consul" -node=agent-one -bind=<ip of machine> -bootstrap-expect 1 &

  • consul agent starts Consul. -server dictates that the Consul agent starts in server mode
  • -data-dir is the directory Consul stores stuff. You need this, so just put it wherever you think it's appropriate
  • -node=<name> gives the server node a name. This can help if you have many servers and need to find a specific one in the future
  • -bind=<ip> is the address that Consul will attach itself to. If another Consul agent wants to communicate with this server, that is the address that the agent needs to use. It's important to make it the address of your host machine, and make sure the machine has an IP address that you can access on your local network.
  • -bootstrap-expect <number> Consul expects the number you enter as the number of servers that are going to exist before Consul bootstraps them. If you are following this demo then use 1. The number should be 3 or 5. For more info go here
  • & Run the process in the background. You can additionally send the logs that get streamed back to a file like so: consul agent -server -data-dir="/tmp/consul" -bind=<ip of machine> -bootstrap-expect 1 >> /var/log/consul/output.log &

To check if this command worked, type consul members to see all the servers that are detected from that machine.

Start the Nomad server agent

Next is starting up the Nomad agent in server mode. We want the additional benefit of service discovery so that means that Nomad should know where the Consul server is. In this scenario, Consul and Nomad will be running in the same machine so we don't need to configure Nomad to point to the Consul server. Unlike the previous command we will set up configurations for the agent through an HCL file, which is like Hashicorp's own version of JSON. You can have comments, so that's nice.

# Increase log verbosity
log_level = "DEBUG"

# Setup data directory
data_dir = "/tmp/nomad"

# To talk to this Nomad agent, use this IP. Put the IP address of your machine here
bind_addr = <ip of machine>

# Server configuration
server {
    enabled = true

    # Should be 3 or 5 for production. We only expect one server to bootstrap
    bootstrap_expect = 1
}

In order to use this file (I'll assume the file is called server.hcl) to start the Nomad agent run the following: nomad agent -config server.hcl &

Nomad and Consul are made to communicate by default

As long as you run both the Consul and Nomad agent on the same machine then you should get scheduling and service discovery working together for free!

Now if you run nomad server members you should get this:
Error querying servers: Get http://127.0.0.1:4646/v1/agent/members: dial tcp 127.0.0.1:4646: getsockopt: connection refused

We changed the bind address of this Nomad server, and in this case the nomad command assumes it communicates to Nomad on localhost. All this means is we need to change the command to this:
nomad server-members -address=http://<bind address of Nomad server>:4646
Now we will see the list of Nomad servers that are connected. We should just have one.

This machine will have the servers running. Now it's time to start another machine that will have client agents that can communicate with the servers. I will explain why it's important to have agents in the next section.