Cluster

From KlavoWiki
Jump to navigationJump to search

This document is deprecated. Please refer to:





Linux Cluster

Overview

The Linux clustering software is called Heartbeat (HA) and is a simple suite of tools which monitors the availability of a service on a virtual IP. If that service stops responding then the clustering software will remove the virtual IP from the active host, bring it online on the passive host and start the services. The services to be clustered are simply the name of the init.d file used to start the service, ie: httpd for Apache or mysqld for MySQL Server.

Hardware Requirements

In order for HA to function your nodes must be connected to each other via serial ports or ethernet. It is recommended that you use a seperate NIC for the heartbeats when using ethernet. If using a single NIC there could be a chance that a false failover may occur when high traffic is experienced on the NIC. If you are using ethernet for your hearbeat then remember to place it on a private subnet that only those NICs are connected too.

/etc/sysconfig/network-scripts/ifcfg-em1
ifup em1

Installation

HA can be installed from the yum repositories. The package name is "heartbeat". This must be installed on all nodes in the cluster.

yum install heartbeat
chkconfig heartbeat on

CentOS 6.0

With CentOS 6 hearbeat does not seem to be among the yum packages, so to install heartbeat on CentOS 6.0 we need to add another repository. We'll use epel at the Fedora Project

rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

vi /etc/yum.repos.d/epel.repo
       enabled=0

yum --enablerepo=epel install heartbeat

Configuration

There are several configuration files which need to be created to configure the cluster resources.

ha.cf

This file located at /etc/ha.d/ha.cf contains the cluster groups information. A detailed explanation of each option available in the file is available at http://www.linux-ha.org/ha.cf. This guide will explain the information needed to configure the cluster group over an ethernet and serial connection.

ha.cf Ethernet

bcast           em1
udpport         694
  • The bcast directive is used to identify which interface to use for the heartbeat pings.
  • The udpport directive sets the port that heartbeats will run over.

ha.cf Serial

baud            38400
serial          /dev/ttyS1
  • The baud directive is used to set the speed of the serial communications. Valid options are 9600, 19200, 38400, 57600, 115200, 230400, 460800.
  • The serial directive is used to identify which serial port to use for heartbeat communication.

ha.cf Generic

keepalive       2
warntime        5
deadtime        15
initdead        30
auto_failback   off
node            filserver1.domain.name.com
node            filserver2.domain.name.com

logfile /var/log/ha-log
  • The keepalive directive sets the interval between heartbeat packets. This is formatted according to HeartbeatTimeSyntax.
  • The warntime directive sets the time before HA changes the cluster status to warning. This is formatted according to HeartbeatTimeSyntax.
  • The deadtime directive sets the time before HA decides the node is dead and a failover occurs. This is formatted according to HeartbeatTimeSyntax.
  • The initdead directive sets the time before HA decides on startup that a node is dead and a failover occurs. This is formatted according to HeartbeatTimeSyntax.
  • The auto_failback directive sets whether or not HA should fail back resources to its preferred node when that node begins to respond again.
  • The node directive identifies what servers are part of the cluster group.

Other Options

  • debugfile /var/log/ha-debug
  • logfile /var/log/ha-log
  • logfacility local0

haresources

This file is located at /etc/ha.d/haresources and is used to configure the cluster resources that are running within the group. The format of this file is

preferred_cluster_node virtual_ip services to cluster

So if our preferred cluster node is "filserver1.domain.name.com ", is using the virtual IP "192.168.25.250" and has the asterisk and httpd services in the cluster resource, the file would look like this;

filserver1.domain.name.com 192.168.24.250  asterisk httpd

authkeys

This file is located at /etc/ha.d/authkeys and just needs to contain these lines;

auth    1
1       crc

or

auth 1
1 sha1 ThisIsMyStrongPassword

options are:

auth    1
1 crc
2 md5   ThisIsMyStrongPassword
3 sha1  ThisIsMyStrongPassword

in the example above crc is select. Chaning auth from 1 to 2 would use md5.

Make sure you change the permissions for the authkeys file to be read and write (chmod 600)

chmod 600 /etc/ha.d/authkeys

copy config files

Make sure you copy the following files to all members that participate in the cluster.

/etc/ha.d/ha.cf
/etc/ha.d/haresources 
/etc/ha.d/authkeys 

Content Replication

Typically you would have your services running from a clustered file system/SAN/NFS/drbd however there are times when you must replicate the content between the nodes. A Linux tool called rsync exists for this purpose and is in use on the Perth Asterisk installation to copy the Asterisk database and configuration files between the servers.

Script

Below is the script in use which is stored in /opt/adminscripts/asteriskRsyncConfig

#!/bin/bash

# Define the public if of your HA IP address & virtual IP of your HA resource
strPublicIf=em1:0
strVIP=192.168.29.250

# *** DO NOT EDIT BELOW THIS LINE ***
strCheckNet=$(/sbin/ifconfig $strPublicIf | /bin/grep -i inet)
intCheckNetLength=${#strCheckNet}
strCurDate=$(date +%Y%m%d%H%M)

if [ $intCheckNetLength -eq 0 ]
then
   echo "#################################################"
   echo "Rsync Check at :: $strCurDate"
   echo "--> VIP does not exist on this server"
   echo "--> Continuing with RSYNC process"
   echo ""
   echo "intCheckNetLength: $intCheckNetLength"
   echo "strCheckNet: $strCheckNet"

   rsync    -av --delete root@$strVIP:/etc/asterisk/ /etc/asterisk/
   rsync    -av --delete root@$strVIP:/etc/dahdi/ /etc/dahdi/
   rsync    -av --delete root@$strVIP:/var/lib/asterisk/agi-bin/ /var/lib/asterisk/agi-bin/
   rsync    -av --delete root@$strVIP:/var/lib/asterisk/astdb /var/lib/asterisk/astdb
   rsync    -av --delete root@$strVIP:/var/lib/asterisk/keys/ /var/lib/asterisk/keys/
   rsync -r -av --delete root@$strVIP:/var/spool/asterisk/ /var/spool/asterisk/
   rsync -r -av --delete root@$strVIP:/var/log/asterisk/ /var/log/asterisk/
   rsync -r -av --delete root@$strVIP:/home/PlcmSpIp/  /home/PlcmSpIp/
   rsync -r -av --delete root@$strVIP:/var/www/html/ /var/www/html/
   rsync -r -av --delete root@$strVIP:/opt/ /opt/
   rsync    -av --delete root@$strVIP:/etc/ali* /etc/
   rsync    -av --delete root@$strVIP:/etc/rc.local /etc/
   rsync    -av --delete root@$strVIP:/etc/logrotate.d/asterisk /etc/logrotate.d/asterisk
   rsync    -av --delete root@$strVIP:/usr/sbin/fax2mail /usr/sbin/fax2mail
   echo "#################################################"
   echo ""
fi

The fields to edit are strPublicIf and strVIP and must reflect the ethernet interface your virtual IP will exist on and the virtual IP. This script is then scheduled via cron to run every 5 minutes.

SSH keys

The replication script uses rsync over SSH to copy the contents, however with any SSH connection the user must be authenticated by ways of a password. As we are launching this script via a scheduled task we are unable to input the password automatically so to overcome this problem we use SSH keys. SSH keys contain the IP of the machine, username and a public/private key which are used for the authentication of SSH connections.

Create SSH keys

To create the SSH keys you run this command on each node;

[root@localhost ~]# ssh-keygen -t rsa

The output will look like this;

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
2b:7d:e0:7c:ca:7b:46:95:fd:94:9e:55:be:c5:8d:07 root@localhost

You do not want to enter a passphrase as you will be required to enter this for each SSH connection (defeating our purpose of no manual intervention).

Copy keys

On each node you must copy the /root/.ssh/id_rsa.pub to the other nodes but into the file /root/.ssh/authorized_keys'. For example;

[root@localhost ~]# scp /root/.ssh/id_rsa.pub root@otherhost:/root/.ssh/authorized_keys

This will over write authorized_keys if it exists so you may be better off copying it to a temporary file like temp_key and then adding it into authorized_keys

Disable Auto Startup of Services

It is a requirement of heartbeat; that the services that heartbeat controls are not automatically started by the system. For this to happen we need to disable the automatic start up of http and asterisk as below.

chkconfig asterisk off
chkconfig httpd off