We often get asked how to deploy the Unimus Core in a high availability scenario. While Unimus can natively handle multiple Cores attempting to connect and become the active poller for a single Zone by dropping an incoming Core connection if another Core is already active, this is not an ideal solution in large-scale deploys. In this article we will explore setting up a clustered Unimus Core deploy using Corosync and Pacemaker.
Here is a high-level diagram of how our example setup looks like:
Using clustering, only one of the Cores will ever be active - this is an active / passive HA scenario. If the active Core fails for any reasons, Pacemaker will failover the service to another available cluster member.
For the sake of simplicity we will be deploying a 2-node cluster in this example.
Components of the cluster
Components of our clustering solution:
- Linux - our base operating system that our cluster nodes run.
- Corosync - Provides cluster node membership and status information. Notifies of nodes joining/leaving cluster and provides quorum.
- Pacemaker - Cluster resource manager (CRM). Uses the information from Corosync to manage cluster resources and their availability.
pcs- A helper utility that interfaces with Corosync (
corosync.conf) and Pacemaker (
cib.xml) to manage a cluster.
- Unimus Core - Our service we want to have highly available.
We will use
pcs to manage the cluster.
pcs is a cluster manager helper that we can use as a single frontend for the setup and management of our cluster. Without
pcs you would need to setup Corosync manually through the
corosync.conf config file, and manage Pacemaker configuration through its
Deploying Corosync / Pacemaker without
pcs is absolutely possible, but for the sake of simplicity we will rely on
pcs to setup Corosyn and Pacemaker for us.
The example commands below were tested on Ubuntu 18, but the setup should be very similar on any other Linux distro. We assume you are starting with a clean Linux system. As such, we need to prepare both our cluster members by running these commands:
# run everything as root sudo su # update apt-get update && apt-get upgrade -y # install dependencies apt-get install -y \ wget \ curl \ corosync \ pacemaker \ pcs # install Unimus Core in unattended mode wget https://unimus.net/install-unimus-core.sh && \ chmod +x install-unimus-core.sh && \ ./install-unimus-core.sh -u # setup Unimus Core config file cat <<- "EOF" > /etc/unimus-core/unimus-core.properties unimus.address = your_server_address_here unimus.port = 5509 unimus.access.key = your_access_key logging.file.count = 9 logging.file.size = 50 EOF
Next up, we need to setup a single user that will be the same across all cluster nodes. This user will be used by
pcs to kickstart our cluster setup.
pcs already creates a
hacluster user during its installation, so we will just change that user's credentials:
CLUSTER_PWD="please_insert_strong_password_here" echo "hacluster:$CLUSTER_PWD" | chpasswd
After we have a common user across our cluster nodes, pick one node from which we will control the cluster. We can run these commands to setup the cluster:
CLUSTER_PWD="please_insert_strong_password_here" # setup cluster pcs cluster auth test-core1.net.internal test-core2.net.internal -u hacluster -p "$CLUSTER_PWD" --force pcs cluster setup --name unimus_core_cluster test-core1.net.internal test-core2.net.internal --force # start cluster pcs cluster enable --all pcs cluster start --all
Since we are using a 2-node cluster in this example, we need to set a few other specific properties. First, we disable quorum, as achieving a quorum with 2 nodes is impossible. We also disable fencing.
pcs property set no-quorum-policy=ignore pcs property set stonith-enabled=false
Our cluster setup should now be done, so lets check our cluster status:
pcs property list pcs status
You should see both your cluster nodes online, like this:
root@test-core1:~# pcs status Cluster name: unimus_core_cluster Stack: corosync Current DC: test-core1 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Tue Mar 4 01:08:51 2022 Last change: Tue Mar 4 01:04:49 2022 by hacluster via crmd on test-core1 2 nodes configured 0 resources configured Online: [ test-core1 test-core2 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled root@test-core1:~#
If you don't see your cluster members online, or
pcs status complains about some issues, here are a few common pitfalls:
- Your cluster nodes should NOT be behind NAT (this is possible, but requires more config not covered in this guide).
- You must use hostnames / FQDNs for cluster nodes. Using IPs is a no-go. If needed, create hostnames for cluster nodes in
- The hostname / FQDN you used resolves to
127.0.0.1, or a different loopback. This is also a no-go as Corosync / Pacemaker require that the hostnames / FQDNs used for clustering resolve to actual cluster member IPs.
In general, most of these issues can be resolved by proper DNS setup, or by creating proper records in
Creating a cluster resource
Now that our cluster is up, we can tell Pacemaker to start managing the Unimus Core service as a clustered service.
First we however need to disable Unimus Core from starting automatically at system startup on each node:
# disable Core autostart, Pacemaker will control this systemctl stop unimus-core systemctl disable unimus-core
Then we can create our cluster resource through
pcs on one of our cluster nodes:
# we might want to set node as ineligible to run the service if it fails to start pcs resource defaults migration-threshold=1 # setup our cluster resource pcs resource create unimus_core systemd:unimus-core op start timeout="30s" op monitor interval="10s"
You will notice we use
systemctl, and also declared the cluster resource using the
systemd resource agent. We do this because Ubuntu 18 (which we are showcasing this setup on) uses
systemd. If you are running a distro which doesn't use
systemd as its init system, you will need to do things differently.
We recommend checking out Pacemaker documentation on available resource agents and how to use them.
Monitoring cluster resources
Now that our cluster resource is created, lets check if it works:
pcs status resources
You should see that the Core is running on one of the nodes. Here is how our output looks:
root@test-core1:~# pcs status resources unimus_core (systemd:unimus-core): Started test-core1 root@test-core1:~#
You can also check the status of the
unimus-core service on both of your cluster nodes:
# on core1 root@test-core1:~# systemctl status unimus-core ● unimus-core.service - Cluster Controlled unimus-core Loaded: loaded (/etc/systemd/system/unimus-core.service; disabled; vendor preset: enabled) Drop-In: /run/systemd/system/unimus-core.service.d └─50-pacemaker.conf Active: active (running) ... root@test-core1:~# # on core2 root@test-core2:~# systemctl status unimus-core ● unimus-core.service - Unimus Remote Core Loaded: loaded (/etc/systemd/system/unimus-core.service; disabled; vendor preset: enabled) Active: inactive (dead) root@test-core2:~#
You should also see the Core connect to your Unimus server, and the Zone should be
Live monitoring of cluster status
To monitor the cluster status live, you can run the
crm_mon in the live / interactive mode (just run the
crm_mon command) and see the Core service fail over to a 2nd node on failure.
Simulating a failure
You can easily simulate a failure in many ways. You can reboot one of your cluster members, and you should see the failover occur. You should also see the Zone briefly go
OFFLINE in Unimus and then back
ONLINE. You can also simulate a failure on one of the cluster nodes by running:
crm_resource --resource unimus_core --force-stop
You should see that Core started on the other node:
root@test-core1:~# pcs status resources unimus_core (systemd:unimus-core): Started test-core2 root@test-core1:~#
For the original node (
test-core1 in our case) to be considered as a viable node to run our resource, we need to run:
pcs resource cleanup unimus-core
If you want to migrate the service back to the first node, you can run:
# force a move to another cluster member crm_resource --resource unimus_core --move # clear any resource constraints we created crm_resource --resource unimus_core --clear
A move may create a constraint to not place the service on the previous node in the future. This is why we clear all constraints after a mode. A useful command to check existing constraints on our cluster resource is:
crm_resource --resource unimus_core --constraints
Hopefully this article can guide you in creating a HA setup for your Unimus Cores. If you have any questions, or you run into any issues, please feel free to post in the Support section of our forums, or contact us through our usual support channels.