Last time i experienced problems with Zenoss. Zenoss suddenly stopped getting information from the clients, stopped graphing client data and i did not know what the reason was for this situation. I tried removing and readding the client but the job of adding a new client was even not processed. When i tried to re-model the client i saw an error saying “ZenHub is down”.
I started to dig in and started to view all of those logs Zenoss produces (there are just too many!) and found that zenstatus log is also reporting “ZenHub is down” error. Here is what i found and how i resolved the “ZenHub is down” Zenoss problems.
Let’s fix the “ZenHub is down” problem!
1. Check ZenHub Status
You should see that ZenHub is actually up and running which is surprising due to the fact the only error you can find says “ZenHub is down”.
Be sure to run the following commands as “zenoss” user!
[zenoss@geekpeek ~]$ zenhub status program running; pid=19774
2. Enable ZenHub Debug Mode
You can enable ZenHub debug mode easily and on the fly with the following command.
[zenoss@geekpeek ~]$ zenhub debug Sending SIGUSR1 to 19774
3. Check Zenoss Logs
With ZenHub in debug mode you should see additional information from zenhub.log file. If you find the error “zen.ZenHub: all workers are busy” you are experiencing exactly the same problem i did.
/opt/zenoss/log/zenhub.log
2015-02-19 11:27:31,902 DEBUG zen.ZenHub: all workers are busy 2015-02-19 11:27:36,903 DEBUG zen.ZenHub: worklist has 27 items
Zenstatus.log file is still saying that “ZenHub may be disconnected” and “ZenHub is down”.
/opt/zenoss/log/zenstatus.log
2015-02-19 11:20:01,113 WARNING zen.zenstatus: No service named 'EventService': ZenHub may be disconnected 2015-02-19 11:20:01,113 ERROR zen.maintenance: Maintenance failed. Message from hub: ZenHub is down 2015-02-19 11:20:02,653 ERROR zen.zenstatus: No event service: None 2015-02-19 11:20:07,657 ERROR zen.zenstatus: No event service: None 2015-02-19 11:20:12,354 INFO zen.pbclientfactory: Initial connect timed out after 60 seconds 2015-02-19 11:20:12,658 ERROR zen.zenstatus: No event service: None 2015-02-19 11:20:17,663 ERROR zen.zenstatus: No event service: None 2015-02-19 11:20:22,664 ERROR zen.zenstatus: No event service: None
4. Check RabbitMQ Connections
An crucial advice here is to check RabbitMQ connections on Zenoss server. RabbitMQ is processing events for Zenoss server.
[root@geekpeek log]# rabbitmqctl list_connections Listing connections ... zenoss 127.0.0.1 41928 blocked zenoss 127.0.0.1 39100 blocked zenoss 127.0.0.1 46164 blocked zenoss 127.0.0.1 39083 blocking zenoss 127.0.0.1 39069 blocked zenoss 127.0.0.1 39101 blocked zenoss 127.0.0.1 39063 blocking zenoss 127.0.0.1 39077 blocked zenoss 127.0.0.1 39052 blocked zenoss 127.0.0.1 39062 blocking ...done.
If you see all RabbitMQ connections in “blocked” or “blocking” status this is the root of all your problems!
5. Reconfigure RabbitMQ
The problem is RabbitMQ has a free disk space limit set for the partition on which it is storing data and when available disk space falls below this limit, flow control is triggered. If you are aware of the disk space and RAM memory available you can reconfigure this value by editing “/etc/rabbitmq/rabbitmq.config” file.
[root@geekpeek ~]# cat /etc/rabbitmq/rabbitmq.config [ {rabbit, [{disk_free_limit, {mem_relative, 0.1}}]} ].
You can set the disk free limit relative to the memory available as shown above.
6. Restart RabbitMQ and Zenoss
After RabbitMQ reconfiguration you should restart RabbitMQ and Zennos for the configuration changes to apply.
[root@geekpeek ~]# /etc/init.d/rabbitmq-server restart [root@geekpeek ~]# /etc/init.d/zenoss restart
You should now have a working Zenoss without “ZenHub is down” error messages.