Erlang

RabbitMQ monitoring: WombatOAM and the RabbitMQ Management plugin

2016-10-04 by Ayanda Dube

1. Introduction

 

If you're a RabbitMQ user, then you must be accustomed to monitoring and keeping track of the status of your Rabbit installation by use of the native RabbitMQ Management plugin, or, alternatively, using third party monitoring tools such as Sensu, which internally make use of the RabbitMQ Management API for metrics acquisition, to present them on custom UIs. Regardless of the user's preferred tool, a common aspect which cuts across most of these off-the-shelf tools is full dependency on the RabbitMQ Management plugin. In other words, for you to monitor and manage your RabbitMQ installation via a web interface, the RabbitMQ Management plugin has to be enabled at all times on the RabbitMQ nodes. The downside of this approach lies mainly on the overhead the RabbitMQ Management plugin introduces per each node it is enabled on. The following image depicts the accompanying and required applications introduced on a node when the RabbitMQ Management plugin is enabled;

A total of 13 additional applications are required by the RabbitMQ Management Plugin, which aren’t related to, or required to run any of the AMQP operations. Internally, the RabbitMQ Management Plugin creates multiple Erlang ETS tables, which are RAM based, in order to store, aggregate and compute statistics and various RabbitMQ specific node metrics. Unless the hardware has been dimensioned to take this into account, it can place a huge demand on the node’s memory, and could potentially contribute to a number of unknown side effects when traffic load patterns vary from peak to peak. Ironically, a common recommendation in the RabbitMQ community for troubleshooting is to disable and re-enable the RabbitMQ Management Plugin!

In an ideal world, RabbitMQ nodes should be dedicated to delivery of the AMQP protocol (queueing and message interchange logic between connected clients). All potential burdensome operations like UI monitoring and management should ideally be taken care of on a completely independant node; deployable on a separate physical machine from the RabbitMQ nodes, for all OAM functions. This is how telecoms systems have addressed monitoring and operations for decades. This inflexibility of the RabbitMQ Management plugin and other monitoring tools dependant on its API brings to light the main strengths and advantages of using our tool WombatOAM [4]. Illustrated below, is the application overhead WombatOAM introduces on a RabbitMQ node;

So what is different about the WombatOAM approach of monitoring RabbitMQ? Firstly, only 1 additional application is introduced, which is the wombat_plugin, unlike the 13 additional applications which are introduced by the RabbitMQ Management plugin. The reason behind this is the fact that WombatOAM only requires a single and very lightweight application, which is the wombat_plugin, on the node it's monitoring, to relay all metrics and events to a separate and independent node, responsible for carrying out all heavy computations and UI related operations. This implies a much less, if not negligible, application overhead on the RabbitMQ node in comparison to that introduced by RabbitMQ Management plugin, which carries out all of its computations and operations on the RabbitMQ node, itself.

WombatOAM thus fully leverages distributed Erlang by carrying out its major operations and maintenance functions in a loosely coupled manner, on an independent and separate node. This grants much more freedom to the RabbitMQ nodes to predominantly be focused on carrying out AMQP functions, only, with very little or no chance at all, of experiencing any problems relating to UI operations and maintenance functions.

NOTE: The RabbitMQ Management Plugin does attempt to reduce the amount of overhead when used in a cluster setup. Not all nodes in a cluster need the full RabbitMQ Management plugin enabled, but just one. The rest of the nodes need only the RabbitMQ Management Agent plugin enabled, which is a single lightweight application with no additional dependencies. This implies that any problems arising from the heavy resource footprint of the RabbitMQ Management Plugin are likely to be only experienced on 1 out of N cluster nodes.

The following results were captured from a benchmark test executed against a RabbitMQ node; first as a standalone node, next, being monitored by WombatOAM, and finally with the RabbitMQ Management plugin enabled (without WombatOAM). This test was carried out on a 2GHz Intel i7 8 core MacBook Pro, with 16G memory.

  Standalone Node WombatOAM enabled RabbitMQ Management plugin enabled
Applications Count 9 10 22
Erlang Process Count 257 287 344
Memory (MB) 145.2 152 186.8
Message Rate (msgs/sec) 18692 18518 18321

 

It's clear from the captured results, that the RabbitMQ node experiences less strain, and thus performs much better (higher message rate, for example), when being monitored by WombatOAM, as compared to monitoring using the RabbitMQ Management plugin. Even without the application of traffic, the total process count for example on the RabbitMQ node goes up from 144 to 232 when using the RabbitMQ Management Plugin, as compared to only 174 when using WombatOAM, which is much less overhead. Latency was also captured to be averaging at approximately 237920 microseconds using WombatOAM, as compared to an average of 254705 microseconds with the RabbitMQ Management Plugin enabled. These improvements in message rate and latency become more relevant on an aggregate scale, on cluster wide setups, playing a crucial role on the overall system’s performance.

In addition, this same test may also be carried out with WombatOAM being used to monitor a node with the RabbitMQ Management Plugin already enabled, in order to gather and analyse hundreds more related metrics, alarms and notifications from the virtual machine that RabbitMQ runs on. These metrics, such as process memory, atom memory, Mnesia and TCP specific metrics, and much more, are are not available from the RabbitMQ Management Plugin.

NOTE: These tests could also yield varied results depending on the hardware platform on which the test RabbitMQ nodes are installed on.

2. Installation

 

You can ask for a free 45 day product trial of the WombatOAM package by contacting us at general@erlang-solutions.com or by filling in the 'Request a Demo' form on the product page.  The package will be accompanied with the WombatOAM documentation, and easy to follow installation procedures to get it up and running in a very short space of time.

          

3. Features

 

Next up, I'll discuss following WombatOAM features, in the context of RabbitMQ;

  • Metrics
  • Notifications, Events & Alarms
  • Configuration
  • Control
  • Explore

WombatOAM has a vast number of other features and capabilities, explained and illustrated in greater detail in the WombatOAM documentation.

 

3.1. Metrics

One of the primary purposes of the WombatOAM RabbitMQ plugin is metrics acquisition and processing from the RabbitMQ nodes it’s connected to, and presenting these metrics on the WombatOAM UI. WombatOAM provides a number of RabbitMQ specific metrics such as publish rate, deliver rate, messages ready, messages unacknowledged, connection related metrics, and so forth. A total of 49 RabbitMQ specific metrics are currently provided by WombatOAM, and additional metrics can easily be developed if required by WombatOAM users.

  • Live metrics

  • Collected metrics

  • Gauge metrics

In addition, WombatOAM also provides an extremely useful feature of defining upper and lower bound metric thresholds, from which alarms will be raised when these metric limits are exceeded. This means support engineers are never caught by surprise when certain event limits are reached.

3.2. Notifications, Events & Alarms

RabbitMQ users have to manually dig up the log files, copy them across from remote servers, analyse them line by line, in order to get an indication of the events and alarms being raised in their nodes. This can be a tedious and time consuming exercise to carry out, especially when problem resolution is required in a very short space of time on your production environment. However, with WombatOAM installed, events and alarm notifications are presented in a very user friendly manner, on a dashboard, under the Notifications tab, with minimal effort from the user, as compared to manually locating and reading up log files.

WombatOAM alarm and event notifications go on to provide even more information than what the native RabbitMQ logs would expose, for example code version discrepancies. Without WombatOAM, many problems, only detectable by using WombatOAM, would only be discovered during post-problem diagnosis, when the RabbitMQ nodes have already suffered from the side effects, and in some cases, may even have gone down. This is the definition of preemptive support, which is a shortcoming of many monitoring systems, except WombatOAM, which excels in this area. The last chapter, 16, of the text, Designing for Scalability with Erlang OTP explores preemptive support in detail, and why it’s extremely crucial in the operations and maintenance activities of any system, (whether or not Erlang based), such as RabbitMQ. Quoting part of the text:

Preemptive support automation gathers data in the form of metrics, events, alarms, and logs for a particular application; analyzes the data; and uses the results to predict service disruptions before they occur. An example is noticing an increase in memory usage, which predicts that the system might run out of memory in the near future unless appropriate corrective actions are taken.

 

3.3. Configuration

Another powerful feature which WombatOAM provides is configuration management from the user interface. RabbitMQ is heavily configurable from the rabbitmq.config file, and from the application resource files of all its associated plugins. WombatOAM gives full visibility of all application environment variables, which, in RabbitMQ are all the configuration parameters defined in rabbitmq.config file, as illustrated below;

  • Select the rabbit application and start request;

  • Configuration parameters are displayed

WombatOAM also goes the extra mile, to allow users to make in-memory configuration changes during system runtime.

This means that with WombatOAM installed, if you’re, for example, using the RabbitMQ queue master location policies[5], you could update the location policy via the WombatOAM as illustrated below. This will be in effect for the queue master location configuration, in memory.

NOTE: It must be clarified that not all configuration changes will be applied. Some settings are cached, for example, channels retain channel_operation_timeout configuration in their cache. So the new value will not be applied to already created channels, only to new channels created thereafter.

In addition, WombatOAM’s configuration management feature allows for global, cluster wide configuration settings. This is a very useful feature which helps avoid inconsistencies between configuration changes carried out on a single node, as compared to configuration settings on the other cluster nodes.

  • Global configuration changes

3.4. Control

WombatOAM grants a high degree of control over the RabbitMQ nodes it’s monitoring by providing an interface to execute Erlang native functions. This is found under the Services tab of the WomabatOAM UI. This means that AMQP equivalent operations such as queue and exchange declarations, or, native RabbitMQ operations like mirror policy definitions, and control operations (like listing of queues, exchanges, connections, and so forth) may be executed from the WombatOAM UI.

NOTE: This feature will, in the near future, be replaced by a more user friendly and safer means of executing control commands. This should also help avoid the continuous and repeated re-entry of such commands, if executed in a frequent manner.

  • Below is an example of a control function being executed to acquire and display all exchanges

  • Result being displayed

WombatOAM also logs these executed control commands as part of its Notifications, making it possible to carry out minor and/or major audit trail operations, to investigate which commands were executed, and when. This is very useful when, for example, troubleshooting a problem’s cause resulting from human intervention on the RabbitMQ nodes.

Other useful RabbitMQ control functions you could execute from the WombatOAM interface are summarized as follows (you can copy, modify and use these as you like):

Command Function
Declare a queue rabbit_amqqueue:declare(rabbit_misc:r(<<"/">>, queue, _Queue = <<"test.queue" >>), false, false, [], none).
Declare 20 queue begin Queues = 20, L=[{,} = rabbit_amqqueue:declare(rabbit_misc:r(<<"/">>, queue, list_to_binary("test.queue."++integer_to_list(N))), false, false, [], none)
List queues rabbit_amqqueue:list().
Declare exchange rabbit_exchange:declare({resource, _VHost = <<"/">>, exchange, _XName = <<"test.exchange">>}, _Type = topic, _Durable = true, _AutoDelete = false, _Internal = false, _Args = []).
List exchanges rabbit_exchange:list().
Set mirror policy rabbit_policy:set(_VHost = <<"/">>, _PolicyName = <<"ha-2-policy">>, _Pattern = <<".*">>, _Definition = [{<<"ha-mode">>, <<"exactly">>}, {<<"ha-params">>, 2}], _Priority = 1, _ApplyTo = <<"queues">>).
Clear policy rabbit_policy:delete(_VHost = <<"/">>, _PolicyName = <<"ha-2-policy">>).
Create vhost rabbit_vhost:add(_VHost = <<"test.vhost">>).
Authenticate user rabbit_access_control:check_user_pass_login( <<"username">>, <<"password">> ).
Purge queue rabbit_control_main:purge_queue({resource, _VHost = <<"/">>, queue, _QName = <<"test.queue">>}).
RabbitMQ Status rabbit:status().

 

Additionally, WombatOAM also provides control operations which allow users to;

  • Forcefully initiate garbage collection
  • Kill specific processes, using the process identifier (PID) or the registered name as reference
  • Soft purge modules on managed nodes, ensuring no old module versions are held in memory, unless being used by a process.

The image below illustrates these control categories;

3.5. Explore

WombatOAM has a powerful feature, extremely useful for RabbitMQ support and development engineers, which allows them to explore node processes and ETS tables, (including the rabbit_queue table, which consists of all alive queues on the nodes). The image below depicts the categories which may be explored;

  • Different Explore categories

For example, inspecting the process state of an essential process like the vm_memory_monitor is only a matter of navigating to the Explore tab -> Process state category, specifying the process registered name, in this case thevm_memory_monitor, or PID (Process Identifier), and executing the request.

  • Specifying vm_memory_monitor registered name, and executing request

  • Process state of vm_memory_monitor displayed

4. When to use WombatOAM

 

A question often raised regarding WombatOAM and the RabbitMQ Management plugin is;

“....which is the right tool to use, and on which occasions?”

Both these tools may be viewed as complementary to each other. Despite WombatOAM’s superior capabilities to provide more than just RabbitMQ metrics, but also a vast number of events, alarms, Erlang VM node specifics, and so forth, the RabbitMQ Management plugin does also provide useful features, like, for example, manual publishing/receiving of messages from the UI, plugin specific operations like creation of links and upstreams for the Federation plugin, and so forth.

Hence, bearing this in mind, recommendation and best practice is to use both tools in the following manner;

  • WombatOAM, at ALL times, to monitor and keep track of the wellness of the RabbitMQ installation. WombatOAM will introduce very minimal overhead on the RabbitMQ node(s) which it is monitoring, thus chances of end-to-end service interruptions from resource hungry UI operations would be as good as neglible.

  • RabbitMQ Management Plugin, to be enabled only when certain functions and operations are required. For example, specific plugin operations (e.g. Federation plugin configuration), or easy definition of mirroring policies. All these are once off operations, for which the RabbitMQ Management Plugin need not be enabled at all times for.

 

With WombatOAM continually enabled, and the RabbitMQ Management plugin enabled on specific occasions only, when required, common RabbitMQ problems such as nodes raising VM High Watermark memory alarms due to excessive memory usage beyond that permissible, would hardly be ever experienced, since these are attributed by UI operations of a high memory footprint. Unless of course, the root cause is something else not directly related to, nor attributed by, memory hungry UI computations and operations, and also, the basis is not on any hardware limitations on which RabbitMQ is installed on. This is just one major example, amongst an abitrary number of many other problems, which would be alleviated by making use of WombatOAM in this manner.

5. Beyond RabbitMQ

 

Beyond the specific metrics provided by the RabbitMQ plugin, WombatOAM also by default, provides Erlang VM related metrics. This means that WombatOAM gathers more metrics and node specific information from the RabbitMQ nodes like no other monitoring tool available. Additional metrics gathered from the RabbitMQ node are illustrated in the image below;

Each of these metric categories are both rich in the information they provide and crucial in monitoring any RabbitMQ installation, and in essence, any Erlang based system. For example, Memory and Mnesia System metrics would be critical for any RabbitMQ installation, as all metadata (queue, exchanges, bindings, and so forth) is stored in Mnesia, along with the fact that RabbitMQ nodes are classified as either DISC or RAM nodes, based on type of Mnesia tables they’re configured to use. Same applies to all other metrics WombatOAM provides; they all play a crucial role for any RabbitMQ (and Erlang system) installation.

Adding on, I/O metrics are also provided, which can also reveal vital information regarding the number of permissible client connections when specifications such as scalability come into consideration. And off course, Error Log Notifications can indicate the rate at which errors/exceptions and warnings are being reported and logged to the SASL logs and general logs, respectively, by the RabbitMQ nodes. The magnitudes of these metrics in particular, can be an easy and direct indicator that the node is experiencing problems which have, or have not yet been to be detected by the user.

6. Conclusion

 

Rounding up the discussion, its clear that WombatOAM proves an essential and efficient tool for monitoring any RabbitMQ installation. As already pointed out in Section 4, recommendation is to maintain an instance of WombatOAM continually, monitoring your RabbitMQ installation, and have the RabbitMQ Management Plugin on the other hand, due its resource hungry nature, only enabled for certain periods of time, when required for some of its specific control features, to use once off, and disabling it thereafter. This guarantees that during the majority periods of your RabbitMQ installation's uptime, your nodes are dedicated to AMQP operations only, and avoid servere UI related operational overheads induced by the RabbitMQ Management Plugin. Per 24hour monitoring cycle, you could for example, employ the RabbitMQ Management plugin for no more than 2 hours only, when required, with WombatOAM monitoring continually throughout the entire cycle.

If you would like to receive a copy of this post in the form of a PDF Whitepaper, fill in your details here

References

 

[1] https://www.rabbitmq.com/configure.html

[2] https://www.rabbitmq.com/management.html

[3] Designing for Scalability with Erlang OTP

[4] https://www.erlang-solutions.com/products/wombat-oam.html

[5] https://www.erlang-solutions.com/blog/take-control-of-your-rabbitmq-queues.html

Go back to the blog

×

Request more information:

* Denotes required
×

Thank you for your message

We sent you a confirmation email to let you know we received it. One of our colleagues will get in touch shortly.
Have a nice day!