Share, , Google Plus, Pinterest,

Print

Posted in:

ZenHub is Down Zenoss Problems

Last time i experienced problems with Zenoss. Zenoss suddenly stopped getting information from the clients, stopped graphing client data and i did not know what the reason was for this situation. I tried removing and readding the client but the job of adding a new client was even not processed. When i tried to re-model the client i saw an error saying “ZenHub is down”.

I started to dig in and started to view all of those logs Zenoss produces (there are just too many!) and found that zenstatus log is also reporting “ZenHub is down” error. Here is what i found and how i resolved the “ZenHub is down” Zenoss problems.

ZenHub is down Zenoss Problems
ZenHub is down Zenoss Problems

Let’s fix the “ZenHub is down” problem!

1. Check ZenHub Status

You should see that ZenHub is actually up and running which is surprising due to the fact the only error you can find says “ZenHub is down”.

Be sure to run the following commands as “zenoss” user!

[zenoss@geekpeek ~]$ zenhub status
 program running; pid=19774

2. Enable ZenHub Debug Mode

You can enable ZenHub debug mode easily and on the fly with the following command.

[zenoss@geekpeek ~]$ zenhub debug
 Sending SIGUSR1 to 19774

3. Check Zenoss Logs

With ZenHub in debug mode you should see additional information from zenhub.log file. If you find the error “zen.ZenHub: all workers are busy” you are experiencing exactly the same problem i did.

/opt/zenoss/log/zenhub.log

2015-02-19 11:27:31,902 DEBUG zen.ZenHub: all workers are busy
2015-02-19 11:27:36,903 DEBUG zen.ZenHub: worklist has 27 items

Zenstatus.log file is still saying that “ZenHub may be disconnected” and “ZenHub is down”.

/opt/zenoss/log/zenstatus.log

2015-02-19 11:20:01,113 WARNING zen.zenstatus: No service named 'EventService': ZenHub may be disconnected
2015-02-19 11:20:01,113 ERROR zen.maintenance: Maintenance failed. Message from hub: ZenHub is down
2015-02-19 11:20:02,653 ERROR zen.zenstatus: No event service: None
2015-02-19 11:20:07,657 ERROR zen.zenstatus: No event service: None
2015-02-19 11:20:12,354 INFO zen.pbclientfactory: Initial connect timed out after 60 seconds
2015-02-19 11:20:12,658 ERROR zen.zenstatus: No event service: None
2015-02-19 11:20:17,663 ERROR zen.zenstatus: No event service: None
2015-02-19 11:20:22,664 ERROR zen.zenstatus: No event service: None

4. Check RabbitMQ Connections

An crucial advice here is to check RabbitMQ connections on Zenoss server. RabbitMQ is processing events for Zenoss server.

[root@geekpeek log]# rabbitmqctl list_connections
 Listing connections ...
 zenoss 127.0.0.1 41928 blocked
 zenoss 127.0.0.1 39100 blocked
 zenoss 127.0.0.1 46164 blocked
 zenoss 127.0.0.1 39083 blocking
 zenoss 127.0.0.1 39069 blocked
 zenoss 127.0.0.1 39101 blocked
 zenoss 127.0.0.1 39063 blocking
 zenoss 127.0.0.1 39077 blocked
 zenoss 127.0.0.1 39052 blocked
 zenoss 127.0.0.1 39062 blocking
...done.

If you see all RabbitMQ connections in “blocked” or “blocking” status this is the root of all your problems!

5. Reconfigure RabbitMQ

The problem is RabbitMQ has a free disk space limit set for the partition on which it is storing data and when available disk space falls below this limit, flow control is triggered. If you are aware of the disk space and RAM memory available you can reconfigure this value by editing “/etc/rabbitmq/rabbitmq.config” file.

[root@geekpeek ~]# cat /etc/rabbitmq/rabbitmq.config
 [
 {rabbit, [{disk_free_limit, {mem_relative, 0.1}}]}
 ].

You can set the disk free limit relative to the memory available as shown above.

6. Restart RabbitMQ and Zenoss

After RabbitMQ reconfiguration you should restart RabbitMQ and Zennos for the configuration changes to apply.

[root@geekpeek ~]# /etc/init.d/rabbitmq-server restart
[root@geekpeek ~]# /etc/init.d/zenoss restart

You should now have a working Zenoss without “ZenHub is down” error messages.