Lately i was migrating my Nagios monitoring server to a new hardware and also a new CentOS version (CentOS 5.9 -> CentOS 6.4). It was a completely new server with new Linux OS installed and everything was configured perfectly. When i migrated Nagios data to a newly installed server and switched the IP addresses, i started seeing warnings in my Nagios web frontend – “PING WARNING – DUPLICATES FOUND! Packet loss = 0%, RTA = 0.53 ms”. The warning was popping up on different hosts at different times – completely randomly – really weird, i thought!!
I started checking out the problem and browsing Google but i couldn’t find anything that could be related to my server and Nagios configuration. An hour or two passed when an idea popped into my head! On my CentOS 6.4 i configured ethernet NIC bonding – i actually left the bonding configuration default which means that CentOS bonding is configured in load balancing (round-robin) mode!
I dropped one ethernet NIC and the problem disappeared!
The problem was the “Bonding Mode: load balancing (round-robin)”.
Why this happens?Well, i am not exactly a networking expert here so correct me if i am wrong, but in order for load balancing bonding mode to work properly this must be supported by both endpoints. In my case both Ethernet cables from my server are connected to a Cisco switch. This Cisco switch needs to have a special feature enabled to succesfully communicate to my CentOS server load balancing configured Ethernet cards.
So before migrating Nagios server check to see how your NIC bonding is configured!
ROUND-ROBIN
# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) Bonding Mode: load balancing (round-robin) Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 3c:4a:92:f9:33:18 Slave queue ID: 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 3c:4a:92:f9:33:19 Slave queue ID: 0
If it is configured in load balancing (round-robin) be sure to reconfigure it – the endpoint or the Linux bonding!
ACTIVE-BACKUP
# cat /etc/modprobe.d/bonding.conf alias bond0 bonding options bond0 miimon=100 mode=1 # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 3c:4a:92:f9:33:18 Slave queue ID: 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 3c:4a:92:f9:33:19 Slave queue ID: 0
Hope this saves someone an hour of research! Have fun! 🙂