How to Configure Nginx High Availability Cluster Using Pacemaker on Ubuntu 16.04

By Hitesh Jethva, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

High availability is a term that describes a websites or applications that are durable and likely to operate continuously without failure for a long time. High availability provides a number of failsafe, and aims for a 99% uptime. Highly available systems are made from several components, they can be scaled horizontally when needed, thus improving their ability to serve content.

Pacemaker is an advanced, scalable High-Availability cluster resource manager that provides maximum availability of the cluster resources by doing failover of resources between the cluster nodes. Pacemaker uses corosync for heartbeat and internal communication among cluster components. Pacemaker manages all cluster resources and achieves maximum availability by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by Corosync.

In this tutorial, we will explain the installation and configuration of a two-node Nginx web server cluster using Pacemaker on an Alibaba Cloud Elastic Compute Service (ECS) Ubuntu 16.04 server.

Requirements

Two fresh Alibaba Cloud ECS instance with Ubuntu 16.04 server installe.
A static IP address is configured on both instances with a floating IP Address. Note that your IP address will differ based on your ECS instance.
Root password is setup on both instances.

Launch Alibaba Cloud ECS Instance

First, login to your https://ecs.console.aliyun.com/?spm=a3c0i.o25424en.a3.13.388d499ep38szx">Alibaba Cloud ECS Console . Create a new ECS instance , choosing Ubuntu 16.04 as the operating system with at least 2GB RAM. Connect to your ECS instance and log in as the root user.

Once you are logged into your Ubuntu 16.04 instance, run the following command to update your base system with the latest available packages.

apt-get update -y

Getting Started

Before starting, you will need to configure hosts file on each server, so each server can communicate to the other servers with the hostname of the server.

You can do this by editing /etc/hosts file on both servers.

nano /etc/hosts

Add the following lines (replace the variable Node1_IP_Address and Node2_IP_Address with the actual IP address of your ECS instances):

Node1_IP_Address node1
Node2_IP_Address node2

Save and close the file, when you are finished.

Next, test hostname resolution by pinging the other server using hostname:

ping node1
ping node2

Install and Configure Nginx

Before setting up the High Availability web server, you will need to install and configure Nginx on each of the nodes. You can install Nginx by running the following command:

apt-get install nginx -y

Once Nginx is installed, start Nginx service and enable it to start on boot time by running the following command on each of the nodes:

systemctl start nginx
systemctl enable nginx

Next, create default index.html page of Nginx on each node:

On Node1, open the index.html page:

nano /var/www/html/index.html

Remove all the lines and add the following lines:

<h1>
Nginx Cluster ::: Node1
</h1>

Save and close the file when you are finished.

On Node2, open the index.html page:

nano /var/www/html/index.html

Remove all the lines and add the following lines:

<h1>
Nginx Cluster ::: Node2
</h1>

Save and close the file when you are finished.

Now, stop the Nginx service on each node:

systemctl stop nginx

Install Pacemaker, Corosync, and Crmsh

Next, you will need to install Pacemaker, Corosync, and Crmsh on each node. By default, all the packages are available in Ubuntu 16.04 default repository. So you can install all of them with the following command:

apt-get install pacemaker corosync crmsh -y

Once the installation is completed, stop Pacemaker and Corosync services with the following command:

systemctl stop corosync
systemctl stop pacemaker

Configure Corosync

Next, you will need to configure Corosync on Node1 and generate the Corosync key for the cluster authentication.

Before starting, you will need to install haveged to generate random numbers for the Corosync key. You can install it with the following command:

apt-get install haveged -y

Next, generate Corosync key by running the following command:

corosync-keygen

You should see the following output:

Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 920).
Writing corosync key to /etc/corosync/authkey.
You can also see the generated key using the following command:
ls -l /etc/corosync/

Output:

-r-------- 1 root root  128 Feb 28 20:39 authkey
-rw-r--r-- 1 root root 3929 Oct 21  2015 corosync.conf

Next, change the directory to /etc/corosync and remove default configuration file:

cd /etc/corosync/
rm -rf corosync.conf

Next, create new corosync.conf file as shown below:

nano corosync.conf

Add the following lines (replace the variable Node1_IP_Address and Node2_IP_Address with the actual IP addresses of your ECS instances):

    totem {
      version: 2
      cluster_name: lbcluster
      transport: udpu
      interface {
        ringnumber: 0
        bindnetaddr: Node1_IP_Address
        broadcast: yes
        mcastport: 5405
      }
    }

    quorum {
      provider: corosync_votequorum
      two_node: 1
    }

    nodelist {
      node {
        ring0_addr: Node1_IP_Address
        name: primary
        nodeid: 1
      }
      node {
        ring0_addr: Node2_IP_Address
        name: secondary
        nodeid: 2
      }
    }

    logging {
      to_logfile: yes
      logfile: /var/log/corosync/corosync.log
      to_syslog: yes
      timestamp: on
    }

service {
  name: pacemaker
  ver: 1
}

Save and close the file when you are finished.

Next, copy the corosync authentication key and the configuration file from Node1 to Node2 with the following command:

scp /etc/corosync/* root@Node2_IP_Address:/etc/corosync/

Start Cluster Service

Now, start pacemaker and corosync service on each of the nodes and enable them to start on boot time with the following command:

systemctl start corosync
systemctl enable corosync
systemctl start pacemaker
systemctl enable pacemaker

Once both services have been started, check the status of the service on both nodes with the following command:

crm status

If everything is fine, you should see the following output:

Last updated: Wed Feb 28 21:13:27 2018        Last change: Wed Feb 28 21:12:44 2018 by hacluster via crmd on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured

Online: [ primary secondary ]

Full list of resources:

You can also check the Corosync members with the following command:

corosync-cmapctl | grep members

You should see the IP address of both nodes in the following output:

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.102) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.103) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

Configure Cluster

Now, we are ready to create and configure Pacemaker. Here, we will run all Pacemaker commands on Primary Node (Node1), as it automatically synchronizes all cluster-related changes across all member nodes.

Next, you will also need to disable STONITH mode. STONITH is a mode that can be used to remove faulty nodes. Here, we are setting up a two node cluster, so we don't need STONITH mode.

You can disable it with the following command:

crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore

Now, verify your STONITH status and the quorum policy with the following command:

crm configure show

You should see the following output:

node 1: primary
node 2: secondary
property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.14-70404b0 \
    cluster-infrastructure=corosync \
    cluster-name=debian \
    stonith-enabled=false \
    no-quorum-policy=ignore

Pacemaker is now running and configured. Next, you will need to create some new resources for the cluster, Virtual IP for the floating IP and webserver for Nginx service.

You can create a new Virtual IP resource for floating IP using the crm command as shown below (replace the variable Floating_IP_Address with the actual IP address):

crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 params ip="Floating_IP_Address" cidr_netmask="32" op monitor interval="10s" meta migration-threshold="10"

Next, create a webserver resource using the following command:

crm configure primitive webserver ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op start timeout="40s" interval="0" op stop timeout="60s" interval="0" op monitor interval="10s" timeout="60s" meta migration-threshold="10"

Next, check the status of the new resource with the following command:

crm resource status

You should see the following output:

 virtual_ip    (ocf::heartbeat:IPaddr2):    Started
 webserver    (ocf::heartbeat:nginx):    Started

Next, you will also need to add a group for the new configuration of the Failover IP service. Now, add the virtual_ip and webserver resources to a new group named hakase_balancing by running the following command:

crm configure group hakase_balancing virtual_ip webserver

Next, check the status of the new resource with the following command:

crm resource show

You should see the following output:

Resource Group: hakase_balancing

     virtual_ip    (ocf::heartbeat:IPaddr2):    Started
     webserver    (ocf::heartbeat:nginx):    Started

Test High Availability

The cluster configuration is now completed, it's time to check the status of node and cluster.

You can do this with the following command:

crm status

You should see the following output:

Last updated: Wed Feb 28 21:35:21 2018        Last change: Wed Feb 28 21:34:50 2018 by root via cibadmin on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ primary secondary ]

Full list of resources:

 Resource Group: hakase_balancing
     virtual_ip    (ocf::heartbeat:IPaddr2):    Started primary
     webserver    (ocf::heartbeat:nginx):    Started primary

You have now two nodes [primary secondary] with status online.

Now, from the remote machine, open your web browser and type the URL http://Floating_IP_Address (replace the variable Floating_IP_Address with the actual IP address). You should see the Node1 page:

Next, stop the cluster service on Node1 with the following command:

crm cluster stop

Now, check the cluster status on the Node2 with the following command:

crm status

You should see that primary node is offline and secondary node is online as shown below:

Last updated: Wed Feb 28 22:00:59 2018        Last change: Wed Feb 28 21:46:57 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ secondary ]
OFFLINE: [ primary ]

Full list of resources:

 Resource Group: hakase_balancing
     virtual_ip    (ocf::heartbeat:IPaddr2):    Started secondary
     webserver    (ocf::heartbeat:nginx):    Started secondary

Troubleshoot Cluster

If your High Availability setup is not working as expected. You can use some useful troubleshooting command to find the exact reason.

The crm_mon is a very useful tool for viewing the real-time status of your nodes and resources:

crm_mon

You should see the following output:

Last updated: Wed Feb 28 23:46:46 2018          Last change: Wed Feb 28 22:00:43 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition WITHOUT quorum
2 nodes and 2 resources configured

Online: [ secondary ]
OFFLINE: [ primary ]

 Resource Group: hakase_balancing
     virtual_ip (ocf::heartbeat:IPaddr2):       Started secondary
     webserver  (ocf::heartbeat:nginx): Started secondary

You can see your cluster configuration using the following command (replace the variable Floating_IP_Address with the actual IP address):

crm configure show

Output:

node 1: primary
node 2: secondary
primitive virtual_ip IPaddr2 \
    params ip=Floating_IP_Address cidr_netmask=32 \
    op monitor interval=10s \
    meta migration-threshold=10
primitive webserver nginx \
    params configfile="/etc/nginx/nginx.conf" \
    op start timeout=40s interval=0 \
    op stop timeout=60s interval=0 \
    op monitor interval=10s timeout=60s \
    meta migration-threshold=10
group hakase_balancing virtual_ip webserver
property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.14-70404b0 \
    cluster-infrastructure=corosync \
    cluster-name=debian \
    stonith-enabled=false \
    no-quorum-policy=ignore

You can also troubleshoot cluster by looking the Corosync logs using the following command:

tail -f /var/log/corosync/corosync.log

Congratulations! You now have a basic Nginx High Availability server setup using Corosync and Pacemaker on Ubuntu 16.04 server. For more information refer the official Pacemaker doc.

Related Alibaba Cloud Products

Server Load Balancer is a ready-to-use service that seamlessly integrates with Elastic Compute Service (ECS) to manage varying traffic levels without manual intervention. First, you need to add the ECS instances to the Server Load Balancer instance. Server Load Balance then distributes incoming traffic across multiple ECS instances, detects unhealthy or unsafe instances and routes traffic to healthy and safe instances only.

Alibaba Cloud Express Connect is a convenient and efficient network service. The product provides a fast, stable, secure and private or dedicated network communication between different cloud environments, including VPC intranet intercommunication and dedicated leased line connection across regions and users.
With Express Connect you can increase the flexibility of your network topology and enhance the quality and security of inter-network communication.

Community

How to Configure Nginx High Availability Cluster Using Pacemaker on Ubuntu 16.04

Requirements

Launch Alibaba Cloud ECS Instance

Getting Started

Install and Configure Nginx

Install Pacemaker, Corosync, and Crmsh

Configure Corosync

Start Cluster Service

Configure Cluster

Test High Availability

Troubleshoot Cluster

Related Alibaba Cloud Products

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

testa December 28, 2018 at 9:05 am

Alibaba Clouder

Related Products

ECS(Elastic Compute Service)

Express Connect