LFCS – High Availability (HA) Cluster

Jarret B

Well-Known Member
Staff member
Joined
May 22, 2017
Messages
373
Reaction score
443
Credits
13,805
Within any business, there may be a need to have a resource available nearly 100% of the time. A good example is a web service. If one server should fail, then another server will resume the resource and keep it online.

Hopefully, you can see the benefit of using a cluster. Of course, the more servers, or nodes, in a cluster the better the availability.

Firewall

Since the nodes in the cluster will need to check on each other, the firewall needs ports opened to allow for the information to pass between them.

In these examples we use, we are using the systems we previously set up, if needed, on VirtualBox. You will use Server1 and Server2 for these examples.

On both servers, you need to open a terminal and get root privileges:

Code:
sudo su

You then need to issue the following commands:

Code:
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload

Perform the ‘firewall-cmd’ commands on both servers so traffic can pass between them.

The ports that are opened on CentOS are:
  • TCP 2224
  • TCP 3121
  • TCP 5403
  • TCP 9929
  • TCP 21064
  • UDP 5404
  • UDP 9929
For Ubuntu, the ports opened are:
  • TCP 2224
  • TCP 3121
  • TCP 5403
  • TCP 21064
  • UDP 5404
  • UDP 5405
These are the ports listed in the ‘high-availability.xml’ file in the folder ‘/usr/lib/firewalld/services’.

The firewall is now set, and we can install the necessities for the cluster.

Cluster Installation

To get the cluster working, we need to install the clustering software to manage the nodes and the cluster as whole.

For CentOS, perform the following on each node in the cluster:

Code:
yum install pacemaker pcs resource-agents

For Ubuntu, use the command on all cluster nodes:

Code:
sudo apt install pacemaker corosync pcs

The ‘pcs’ app is used to manage the cluster through the ‘pacemaker’ and ‘corosync’ services.

We are closer to getting the cluster active.

Cluster Configuration

Before we start the cluster, we need to configure some items.

For both CentOS and Ubuntu, the installation of the cluster software creates a new user named ‘hacluster’. We need to set the password for this user so we can use it between servers and remotely. Use the command for Ubuntu:

Code:
chpasswd
hacluster:password
CTRL-D

This will work. Just make ‘password’ the password you want to use. If you by chance want to use ‘password’ as the password, for CentOS this will not work since it will not pass a dictionary test. You can use the following command to bypass it:

Code:
echo ‘hacluster:password’ | chpsswd

For Ubuntu, the services automatically start after installation, but on CentOS we need to manage these services. You need to start and enable the service on each node:

Code:
systemctl enable pcsd
systemctl start pcsd

Next, we need to authenticate the nodes for the cluster. Make sure all nodes are on and they can be seen on the network. Perform the following command, replacing the hostname of each node:

Code:
pcs cluster auth server1.centos.linux.org server2.centos.linux.org

Of course, for my Ubuntu systems, the hostname suffix is ‘ubuntu.linux.org’. If there are more nodes, just add the all using a space as a separator. If the nodes cannot resolve the DNS name, you can use the IP Addresses, just make sure they are static addresses.

After entering the command, the system prompts you for the username and password. Use the username of ‘hacluster’ and for the password, use the one you made for the user previously.

We now have authorized nodes for the cluster, so we need to create the cluster. Use the command:

Code:
pcs cluster setup --name cluster1 server1.centos.linux.org server2.centos.linux.org

If the command has an error that the cluster is already setup, you need to use the parameter ‘--force’ to make the command run. If the clusters already exist and have resources, then it removes everything from the configuration and you have a clean cluster.

Everything should have been reset on the cluster and the service restarted to take all setting changes into effect.

Use the command ‘pcs status’ to see that everything is running properly. If you should get an error that the cluster is not running on this node, then you need to restart the services with the command:

Code:
pcs cluster start --all

This should restart all the services, including ‘pacemaker’ and ‘corosync’. After getting the status, you’ll see something similar to:

Code:
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled

This shows that the ‘pacemaker’ and ‘corosync’ are running, but not enabled to restart when the system boots. We can fix this:

Code:
systemctl enable pacemaker corosync

Now, if you run the status, you’ll see that the two services are active and enabled.

You now have a cluster of two nodes. The two are not sharing a resource to be moved from a failed node to another. Before we add a resource, let’s look at warnings that we see when listing the status.

STONITH

When you run the status of a cluster, you’ll see:

Code:
WARNINGS:
No stonith devices and stonith-enabled is not false

This basically means that STONITH is enabled on the cluster.

You may ask, ‘what is STONITH?’.

STONITH stands for ‘Shoot The Other Node In The Head’. What happens in a cluster if one node fails?

If Node1 is running a web service and goes offline, the Node2 recognizes that the other Node failed and will start the web service. It also causes Node2 to set itself as the primary node managing the web service and should cause Node1 to shut down the cluster services or reboot. If Node1 restarts, it should be a standby Node for the web service when it restarts.

We can turn off STONITH with the command:

Code:
pcs property set stonith-enabled=false

Quorum

A quorum needs a minimum of three systems to run. With any issues on a network between nodes, basically one failing, most of the remaining nodes must agree on how to handle it.

Let’s assume we have four nodes in a cluster using Quorum. If Node1 is running the web service and appears to fail, the remaining three will vote on what to do. They should all agree to shut down Node1 and move the resources to other nodes. If two agree to move the resources to Node 2, then Node2 will start the resource on it.

There can be issues here, but it usually works fine.

Cluster an IP Address

We need to create a resource to share between our two nodes. We will create a second resource, a web server, that will have a specific IP Address. So, we need to create an IP Address for the Designated Controller (DC), or the node running the resource.

We start by finding an unused IP Address. In my case, I will use ‘192.168.1.77’. This means that my web server will have the same address no matter which node is the DC.

NOTE: My DHCP Server assigns addresses at ‘192.168.1.100’, so most addresses below 100 are not in use.

On Server1, we need to create a resource that we will name ‘ipcluster’ with the following command:

Code:
pcs resource create ipcluster ocf:heartbeat:IPaddr2 ip=192.168.1.77 cidr_netmask=24 op monitor interval=10s

If you perform and ‘ip a’ command, you’ll see that your DC has two IP Addresses for one Network Interface Card (NIC), see Figure 1.

Figure 1.JPG

FIGURE 1

My NIC now has the IP of ‘192.168.1.104’ and ‘192.168.1.77’. We can add this to our HOSTS file with a name of ‘cluster1’.

If you perform the ‘pcs status’ command, you’ll see there is now a listing for ‘Resources’ that shows ‘ipcluster’.

Now, let’s look into that command to create the ‘ipcluster’. The ‘ocf’ is the ‘Open Cluster Framework’. The resource is being monitored by a ‘heartbeat. We =are setting up a second IP Address (IPaddr2). We specify the address with ‘ip=192.168.1.77’ as well as its netmask. The ‘op’ is for ‘operation’ to ‘monitor’ the resource and check it every 10 seconds.

If we ping the ‘ipcluster’ address, it should work.

By checking ‘pcs status’, we can see that the both nodes are ‘Online’. If either is in ‘Standby’ mode, you need to get back to an ‘Online’ state. To do this, use the command:

Code:
pcs cluster unstandby <Node-Name>

Change the ‘<Node-Name>’ to the Fully Qualified Domain Name (FQDN) of the Node to get ‘Online’.

To cause the primary node, the one where the ‘ipcluster’ resource is active, to a standby mode and allow the resource to move, use the command:

Code:
pcs cluster standby <Node-Name>

Here, the ‘<Node-Name>’ is the DC that is running the resource, which should be ‘Server1’.

You should see that the resource moves by checking the status. You can use the ‘unstandby’ parameter to change it back to ‘Online’ and set node2 on standby to move the resource back.

So we have a resource that moves when there is a ‘failure’.

Let’s try another.

Cluster a Web Service

For some companies they need a website that is constantly available. Putting a web service on a cluster is one way of making that happen.

As we did with first setting up a cluster, we needed to open TCP/IP Ports in the firewall. Perform:

Code:
firewall-cmd --permanent --add-service=http
firewall-cmd --reload

The port opened is:
  • TCP 80
If you should also use, which we do not need for this example, the service ‘https’, then it will open port:
  • TCP 443
Now we need to install the Apache Web Service on both CentOS nodes with:

Code:
yum install http w3m

For Ubuntu, use:

Code:
apt install apache2 w3m

Notice that we are also installing a command-line web browser (w3m), if you want it. You can still use a GUI browser.

We now need to create a base file to load when the web server is contacted. Create and edit the file ‘/var/www/html.index.html’ and add:

Code:
<h1>Welome to our Cluster!</h1>
<br>
<hr>
<br>
<p>This is Server1.</p>

When you place this in Server2, change the hostname in the last line. This will let you know which server you are accessing.

You can now open a browser, or use ‘w3m’, to access the cluster at ‘192.168.1.77’ and you should get a response from the current DC node.

We also need a file for configuration. On CentOS it will be at ‘/etc/httpd/conf.d/status.conf’ and on Ubuntu it is at ‘/etc/apache2/conf.available/server-status.conf’. The contents of the file should be:

Code:
<Location /server-status>
    SetHandler server-status
    Require local
</Location>

Place the DC on standby. I’ll assume it is Server1, with the command:

Code:
pcs cluster standby server1.centos.linux.org

Now, if you check the website again, it is Server2.

This works without adding the Web Service as a resource. Just having the IP Address move to the other server works just fine, but if you want it as a resource, then:

Code:
pcs resource create webcluster ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl=/etc/httpd/conf/httpd./conf.d/status.conf op monitor interval=10s

For Ubuntu, change the path to the ‘server-status.conf’ file for the ‘statusurl’ parameter.

You need to be sure that the two resources stay together when a system fails and then comes back online. Use the command:

Code:
pcs contraint colocation add webcluster ipcluster INFINITY

Here, the Web Service is now a resource on the Cluster and combined with the IP Address. Add other resources as you need.

Conclusion

Setting up a cluster is not a hard process. Try it on test systems and set up the resources you need and perform tests.

No matter if you need a cluster, they are fun to play with and see how things work when one fails.
 

Members online


Top