Share, , Google Plus, Pinterest,

Print

Posted in:

Linux Cluster Part 2 – Adding and Deleting Cluster Resources

This is the second part of my “Linux Cluster” posts:

Linux Cluster Resources
Linux Cluster Resources

1. CRM Shell

CRM Shell is a command line interface to configure and manage Pacemaker. The CRM Shell should be installed on all your nodes, you can install it from HA-Clustering Repository. Add the following lines to “/etc/yum.repos.d/ha-clustering.repo” file:

[haclustering]
name=HA Clustering
baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/
enabled=1
gpgcheck=0

 

Once installed we can run “crm” command from linux command line and manage our Pacemaker instance. Below is an example of running “crm help” command. If you want help on additional “crm” commands run for example “crm cib help “:

[root@foo1 ~]# crm help

This is crm shell, a Pacemaker command line interface.

Available commands:

    cib              manage shadow CIBs
    resource         resources management
    configure        CRM cluster configuration
    node             nodes management
    options          user preferences
    history          CRM cluster history
    site             Geo-cluster support
    ra               resource agents information center
    status           show cluster status
    help,?           show help (help topics for list of topics)
    end,cd,up        go back one level
    quit,bye,exit    exit the program

 

  • View Linux Cluster Status

[root@foo1 ~]# crm status
Last updated: Mon Oct  7 13:41:11 2013
Last change: Mon Oct  7 13:41:08 2013 via crm_attribute on foo1.geekpeek.net
Stack: classic openais (with plugin)
Current DC: foo1.geekpeek.net - partition with quorum
Version: 1.1.9-2.6-2db99f1
2 Nodes configured, 2 expected votes
0 Resources configured.

Online: [ foo1.geekpeek.net foo2.geekpeek.net ]

 

  • View Linux Cluster Configuration

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2"

2. Adding Cluster Resources

Every cluster resource is defined by a Resource Agent. Resource Agents must provide Linux Cluster with a complete resource status and availability at any time! The most important and most used Resource Agent classes are:

  • LSB (Linux Standard Base) – These are common cluster resource agents found in /etc/init.d directory (init scripts).
  • OCF (Open Cluster Framework) – These are actually extended LSB cluster resource agents and usually support additional parameters

From this we can presume it is always better to use OCF (if available) over LSB Resource Agents since OCF support additional configuration parameters and are optimized for Cluster Resources.

We can check for available Resource Agents by running the “crm ra list” and the desired resource agent:

[root@foo1 ~]# crm ra list lsb
auditd            blk-availability  corosync          corosync-notifyd  crond             halt              ip6tables         iptables          iscsi             iscsid
killall           logd              lvm2-lvmetad      lvm2-monitor      mdmonitor         multipathd        netconsole        netfs             network           nfs
nfslock           pacemaker         postfix           quota_nld         rdisc             restorecond       rpcbind           rpcgssd           rpcidmapd         rpcsvcgssd
rsyslog           sandbox           saslauthd         single            sshd              udev-post         winbind
[root@foo1 ~]# crm ra list ocf
ASEHAagent.sh       AoEtarget           AudibleAlarm        CTDB                ClusterMon          Delay               Dummy               EvmsSCC             Evmsd
Filesystem          HealthCPU           HealthSMART         ICP                 IPaddr              IPaddr2             IPsrcaddr           IPv6addr            LVM
LinuxSCSI           MailTo              ManageRAID          ManageVE            NodeUtilization     Pure-FTPd           Raid1               Route               SAPDatabase
SAPInstance         SendArp             ServeRAID           SphinxSearchDaemon  Squid               Stateful            SysInfo             SystemHealth        VIPArip
VirtualDomain       WAS                 WAS6                WinPopup            Xen                 Xinetd              anything            apache              apache.sh
asterisk            clusterfs.sh        conntrackd          controld            db2                 dhcpd               drbd                drbd.sh             eDir88
ethmonitor          exportfs            fio                 fs.sh               iSCSILogicalUnit    iSCSITarget         ids                 ip.sh               iscsi
jboss               ldirectord          lvm.sh              lvm_by_lv.sh        lvm_by_vg.sh        lxc                 mysql               mysql-proxy         mysql.sh
named               named.sh            netfs.sh            nfsclient.sh        nfsexport.sh        nfsserver           nfsserver.sh        nginx               o2cb
ocf-shellfuncs      openldap.sh         oracle              oracledb.sh         orainstance.sh      oralistener.sh      oralsnr             pgsql               ping
pingd               portblock           postfix             postgres-8.sh       pound               proftpd             remote              rsyncd              rsyslog
samba.sh            script.sh           scsi2reservation    service.sh          sfex                slapd               smb.sh              svclib_nfslock      symlink
syslog-ng           tomcat              tomcat-5.sh         tomcat-6.sh         varnish             vm.sh               vmware              zabbixserver

 

We configure cluster resources with “crm configure primitive” command following by a Resource Name, Resource Agent and Additional Parameters (example):

crm configure primitive resourcename resourceagent parameters

 

We can see HELP and additional Resource Agent parameters by running “crm ra meta” command following by a resource name (example):

[root@foo1 ~]# crm ra meta IPaddr2

 

Before we start adding Resources to our Cluster we need to disable STONITH (Shoot The Other Node In The Head) – since we are not using it in our configuration:

[root@foo1 ~]# crm configure property stonith-enabled=false

 

We can check the Linux Cluster configuration by running “crm configure show” command:

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2" 
    stonith-enabled="false"

..to confirm STONITH was disabled!

  • Adding IP Address Resource

Let’s add IP address resource to our Linux Cluster. The information we need to configure IP address is:

Cluster Resource Name: ClusterIP
Resource Agent: ocf:heartbeat:IPaddr2 (get this info with “crm ra meta IPaddr2”)
IP address: 192.168.1.150
Netmask: 24
Monitor interval: 30 seconds (get this info with “crm ra meta IPaddr2”)

Run the following command on a Linux Cluster node to configure ClusterIP resource:

[root@foo1 ~]# crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 params ip=192.168.1.150 cidr_netmask="24" op monitor interval="30s"

Check Cluster Configuration with:

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
primitive ClusterIP ocf:heartbeat:IPaddr2 
    params ip="192.168.61.150" cidr_netmask="24" 
    op monitor interval="30s"
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2" 
    stonith-enabled="false" 
    last-lrm-refresh="1381240623"

Check Cluster Status with:

[root@foo1 ~]# crm status
Last updated: Tue Oct  8 15:59:19 2013
Last change: Tue Oct  8 15:58:11 2013 via cibadmin on foo1.geekpeek.net
Stack: classic openais (with plugin)
Current DC: foo1.geekpeek.net - partition with quorum
Version: 1.1.9-2.6-2db99f1
2 Nodes configured, 2 expected votes
1 Resources configured.

Online: [ foo1.geekpeek.net foo2.geekpeek.net ]

 ClusterIP    (ocf::heartbeat:IPaddr2):    Started foo1.geekpeek.net

 

As we can see a new resource called ClusterIP is configured in the Cluster and started on foo1.geekpeek.net node.

  • Adding Apache (httpd) Resource

Next resource is an Apache Web Server. Prior to Apache Cluster Resource Configuration, httpd package must be installed and configured on both nodes! The information we need to configure Apache Web Server is:

Cluster Resource Name: Apache
Resource Agent: ocf:heartbeat:apache (get this info with “crm ra meta apache”)
Configuration file location: /etc/httpd/conf/httpd.conf
Monitor interval: 30 seconds (get this info with “crm ra meta apache”)
Start timeout: 40 seconds (get this info with “crm ra meta apache”)
Stop timeout: 60 seconds (get this info with “crm ra meta apache”)

Run the following command on a Linux Cluster node to configure Apache resource:

[root@foo1 ~]# crm configure primitive Apache ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval="30s" op start timeout="40s" op stop timeout="60s"

Check Cluster Configuration with:

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
primitive Apache ocf:heartbeat:apache 
    params configfile="/etc/httpd/conf/httpd.conf" 
    op monitor interval="30s" 
    op start timeout="40s" interval="0" 
    op stop timeout="60s" interval="0" 
    meta target-role="Started"
primitive ClusterIP ocf:heartbeat:IPaddr2 
    params ip="192.168.61.150" cidr_netmask="24" 
    op monitor interval="30s"
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2" 
    stonith-enabled="false" 
    last-lrm-refresh="1381240623"

Check Cluster Status with:

[root@foo1 ~]# crm status
Last updated: Thu Oct 10 11:13:59 2013
Last change: Thu Oct 10 11:07:38 2013 via cibadmin on foo1.geekpeek.net
Stack: classic openais (with plugin)
Current DC: foo1.geekpeek.net - partition with quorum
Version: 1.1.9-2.6-2db99f1
2 Nodes configured, 2 expected votes
2 Resources configured.

Online: [ foo1.geekpeek.net foo2.geekpeek.net ]

 ClusterIP    (ocf::heartbeat:IPaddr2):    Started foo1.geekpeek.net 
 Apache    (ocf::heartbeat:apache):    Started foo2.geekpeek.net

 

As we can see both Cluster Resources (Apache and ClusterIP) are configured and started – ClusterIP is started on foo1.geekpeek.net Cluster node and Apache is started on foo2.geekpeek.net node.

Apache and ClusterIP are at the moment running on different Cluster nodes but we will fix this later,  setting Resource Constraints like: colocation (colocating resources), order (order in which resources start and stop), …

Resource Constraints will be explained in detail in the next “Linux Cluster Part 3” post!

3. Deleting Cluster Resources

We can delete the configured Cluster Resources with “crm configure delete” command following by a Resource Name we want to delete (example:)

crm configure delete resourcename

 

We must always stop the Cluster Resource prior to deleting it!!

We can stop the Resource by running “crm resource stop” command following by a Resource Name we want to stop.

Cluster Resource and Cluster Node management will be explained in detail in the next “Linux Cluster Part 3” post!

We can check the Linux Cluster configuration by running “crm configure show” command and see, if the Cluster Resource was successfully removed from Cluster Configuration.

  • Deleting Apache (httpd) Resource

Let’s stop and delete our Apache Cluster Resource configured in the steps above:

[root@foo1 ~]# crm resource stop Apache
[root@foo1 ~]# crm configure delete Apache

Check Cluster Configuration with:

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
primitive ClusterIP ocf:heartbeat:IPaddr2 
    params ip="192.168.61.150" cidr_netmask="24" 
    op monitor interval="30s"
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2" 
    stonith-enabled="false" 
    last-lrm-refresh="1381240623"

… to confirm Apache resource was deleted from Cluster Configuration.

  • Deleting IP Address Resource

Next let’s stop and delete ClusterIP Resource:

[root@foo1 ~]# crm resource stop ClusterIP
[root@foo1 ~]# crm configure delete ClusterIP

Check Cluster Configuration with:

[root@foo1 ~]# crm configure show
node foo1.geekpeek.net
node foo2.geekpeek.net
property $id="cib-bootstrap-options" 
    dc-version="1.1.9-2.6-2db99f1" 
    cluster-infrastructure="classic openais (with plugin)" 
    expected-quorum-votes="2" 
    stonith-enabled="false" 
    last-lrm-refresh="1381240623"

 

… to confirm the ClusterIP Resource was deleted from our Cluster Configuration.

Be sure to read the nex post Linux Cluster Part 3 – Manage Cluster Nodes and Resources (COMMING SOON!).

Here’s my latest book about High Availability on CentOS Linux

  • Pol

    Cool! Very thanks!
    I wait part 3
    thanks

    Pol

    • Mitch

      Thanks for your support Pol! Part 3 comming up soon!

      Regards,
      Mitch

  • Eran

    Thanks! Definitely the best Linux HA cookbook Iv’e read!

    • Mitch

      That is really nice to hear, thanks Fran!

      Regards,
      Mitch

  • Pingback: Linux Cluster Part 1 - Install Corosync and Pacemaker on CentOS 6 | GeekPeek.Net()

  • Pingback: Linux Cluster Part 3 - Manage Cluster Nodes and Resources | GeekPeek.Net()

  • David

    Great document ! I’m not sure to understand why it’s necessary to disable stonith and also why yuo don’t enable it after ?

    Sorry for this stupid question …

    • Mitch

      Hello David! You are welcome to ask anything and i will try to give you answers 🙂
      STONITH is Shoot The Other Node In The Head – this means you have to set up some kind of solution to kill “the other node”. We can implement STONITH with solutions like UPS, PDU, Lights-out,… If using virtualization we can even write scripts to kill the virtual machine… Since we did not implement no such solutions we can not use it, there fore we disabled STONITH. You can read more here http://clusterlabs.org/doc/crm_fencing.html and here http://www.linux-ha.org/wiki/STONITH.

      I hope this answered your question.

      Regards,
      Mitch

      • ben

        Hi – great tutorial so far. Just a comment on STONITH:

        If you are doing a two node cluster you should definitely have stonith a.k.a. a quorum disk. This is one strength of pacemaker/openais (default on SLES). Centos/Redhat 6 default cluster also gives you option for quorum disk but configuring it is tedious work to the point where when I was trying to figure out how to put it together most of the google responses on Centos/Redhat was to just forget the cluster disk. In the event of failover/split-brain let them do a fence-race (?!!). That is really kludgy. I ended up grinding it out with Centos/Redhat in figuring out how to do quorum disk. In SLES (Pacemaker/openais) it is MUCH easier and in a two-node cluster to avoid split-brain it is a must to have that third vote.

        • ben

          I had to combine about 8 to 10 resource in my research for sles11 cluster buiding. My final steps are here (including stonith config)

          http://geekswing.com/geek/building-a-two-node-sles11-sp2-linux-cluster-on-vmware/

          • Mitch

            Wau Ben! Really appreciate your input on this topic and agree with you completely! For physical servers it would probably be more optimal to use UPS, PDU or Lights out but you can’t use that in virtual environments. Since i have not yet tested out SBD it is definitely time to do so in the near future.
            Thanks for the info and thumbs up for GeekSwing.Com and your research! If i have some problems setting it up i might send you an email 🙂 Regards, Mitch

          • ben

            Hi Mitch – Thanks 🙂 For some reason I cannot reply to your comment below so I’m replying back to mine. I might have misspoke because I only used the cluster software which came integrated with the OS. For SLES that meant pacemaker/openais. For CentOS it’s cman/luci/ricci. For SLES/pacemaker/openais it is definitely easy to add the quorum disk. For CentOS/cman/luci/ricci it takes a lot of work. If you’d like I’d be happy to send you the .pdf I wrote up on SLES (and then when I have it finished, CentOS) so you can have a looksie :). Cheers!

  • ALI

    please help me i am getting the following error configuring Apache:

    [root@node03 ~]# crm status
    Last updated: Mon Nov 25 17:32:56 2013
    Last change: Mon Nov 25 16:07:44 2013 via cibadmin on node03.cluster.com
    Stack: classic openais (with plugin)
    Current DC: node03.cluster.com – partition with quorum
    Version: 1.1.10-1.el6_4.4-368c726
    2 Nodes configured, 2 expected votes
    2 Resources configured

    Online: [ node03.cluster.com node04.cluster.com ]

    ClusterIP (ocf::heartbeat:IPaddr2): Started node03.cluster.com

    Failed actions:
    Apache_start_0 on node03.cluster.com ‘unknown error’ (1): call=22, status=complete, last-rc-change=’Mon Nov 25 17:31:06 2013′, queued=2590ms, exec=0ms
    Apache_start_0 on node04.cluster.com ‘unknown error’ (1): call=13, status=complete, last-rc-change=’Mon Nov 25 17:31:00 2013′, queued=4094ms, exec=0ms

    • Mitch

      Hello Ali! You sem to have some problem with Apache start as you can see in the Failed actions info. I would suggest you try to start Apache manually on each node and see if it starts. If it doesnt, check Apache log for errors. What does your Apache cluster configuration look like? ..and Apache configuration? Did you bind it to your Cluster IP? Do you see errors in Apache log?

      Regards,
      Mitch

      • Ali

        Dear Mitch,
        thank you very much for the reply, as advised above i have manually started httpd on both the nodes but i am still getting the same error. i am using Centos version 6.4 the Apache configuration is default i didnt change any thing and i have copy paste the command for the Apache resource and after initiating crm status command i am getting the following error again:

        crm status
        Last updated: Tue Nov 26 15:21:47 2013
        Last change: Tue Nov 26 15:10:28 2013 via cibadmin on node04.cluster.com
        Stack: classic openais (with plugin)
        Current DC: node04.cluster.com – partition with quorum
        Version: 1.1.10-1.el6_4.4-368c726
        2 Nodes configured, 2 expected votes
        2 Resources configured

        Online: [ node03.cluster.com node04.cluster.com ]

        ClusterIP (ocf::heartbeat:IPaddr2): Started node03.cluster.com

        Failed actions:
        Apache_start_0 on node03.cluster.com ‘unknown error’ (1): call=22, status=complete, last-rc-change=’Tue Nov 26 15:10:35 2013′, queued=2571ms, exec=0ms
        Apache_start_0 on node04.cluster.com ‘unknown error’ (1): call=16, status=complete, last-rc-change=’Tue Nov 26 15:10:32 2013′, queued=2505ms, exec=0ms

        please also note that i am able to display Apache test page through the both node IP and the floating IP.

        • Mitch

          I would also ask you to send me the output of “crm configure show” on info@geekpeek.net and i will help you solve your problem.

          Regards,
          Mitch

  • phyo

    I faced a problem. After adding Apache and when I check the status with crm status, it show me this error:
    [root@centos01 ~]# crm status
    Last updated: Tue Nov 26 17:43:49 2013
    Last change: Tue Nov 26 17:43:15 2013 via cibadmin on centos01.nagios.local
    Stack: classic openais (with plugin)
    Current DC: centos01.nagios.local – partition with quorum
    Version: 1.1.10-1.el6_4.4-368c726
    2 Nodes configured, 2 expected votes
    2 Resources configured

    Online: [ centos01.nagios.local centos02.nagios.local ]

    ClusterIP (ocf::heartbeat:IPaddr2): Started centos01.nagios.local

    Failed actions:
    Apache_start_0 on centos01.nagios.local ‘unknown error’ (1): call=54, status=complete, last-rc-change=’Tue Nov 26 17:43:20 2013′, queued=2447ms, exec=0ms
    Apache_start_0 on centos02.nagios.local ‘unknown error’ (1): call=48, status=complete, last-rc-change=’Tue Nov 26 17:43:17 2013′, queued=2427ms, exec=1ms
    [root@centos01 ~]#
    HTTPD service on both servers is running.
    Thanks.

    • Mitch

      Please send me output of “crm configure show” on info@geekpeek.net and i will help you solve your problem.

      Regards,
      Mitch

  • Muhammad Asim

    Hi MITCh

    Thanks for your above very good document i have one question

    primitive Apache ocf:heartbeat:apache

    in above line what is meaning of heartbeat

    and its necessary the resource name always like below example (p_fs_mysql) or it is just a name
    primitive p_fs_mysql ocf:heartbeat:Filesystem params
    device=”/dev/drbd0″ directory=”/var/lib/mysql_drbd” fstype=”ext4″

  • Pingback: Linux Cluster Part 1 - Install Corosync and Pacemaker on CentOS 6 - GeekPeek.Net()

  • Goi

    Hi Mitch,

    I’m following your 3 part tutorial on
    setting up a HA system with 2 nodes, both running CentOS 6.5, and
    connected via ethernet to a router.

    I successfully
    installed corosync, pacemaker, crmsh, cman and httpd. Both machines are
    able to ping each other via hostname and IP address.

    However, when I get to steps 6/7 of Part 1, I encountered a problem.

    “service corosync start” was successful.

    “service pacemaker start” shows the following error message:
    Starting cman… Corosync Cluster Engine is already running
    [FAILED]

    If I stop corosync and start pacemaker, it completes successfully, and this is what I did.

    I then moved on to Part 2 of your guide, all the way up to adding of Apache as a resource. No errors there.

    Here’s my “crm configure show” output:
    node node01
    node node02
    primitive Apache apache
    params configfile=”/etc/httpd/conf/httpd.conf”
    op monitor interval=30s
    op start timeout=40s interval=0
    op stop timeout=60s interval=0
    primitive ClusterIP IPaddr2
    params ip=192.168.1.110 cidr_netmask=24
    op monitor interval=30s
    property cib-bootstrap-options:
    dc-version=1.1.10-14.el6_5.3-368c726
    cluster-infrastructure=cman
    stonith-enabled=false
    no-quorum-policy=ignore
    rsc_defaults rsc_defaults-options:
    migration-threshold=1

    And here’s the error message with “crm status”
    Last updated: Wed Oct 15 17:38:16 2014
    Last change: Wed Oct 15 17:00:44 2014 via cibadmin on node02
    Stack: cman
    Current DC: node02 – partition WITHOUT quorum
    Version: 1.1.10-14.el6_5.3-368c726
    2 Nodes configured
    2 Resources configured

    Online: [ node02 ]
    OFFLINE: [ node01 ]

    ClusterIP (ocf::heartbeat:IPaddr2): Started node02

    Failed actions:

    Apache_start_0 on node02 ‘unknown error’ (1): call=44, status=complete,
    last-rc-change=’Wed Oct 15 17:00:45 2014′, queued=2185ms, exec=0ms

    Seems
    like Apache isn’t able to start properly. Do you know what might be
    wrong? I did not configure Apache at all. I simply installed it and left
    it as that.

    Any help would be appreciated, thanks!

  • Pingback: Asterisk Cluster |()

  • Pingback: What Is Lvm2-monitor – inmod.net()