You’re a support engineer and your organisation uses a 3-node RabbitMQ cluster to manage its inter-application interactions across your network. On a regular basis, different departments in your organisation approach you with requests to integrate their new microservices into the network, for communication with other microservices via RabbitMQ.
With the organisation being so huge, and offices spread across the globe, onus is on each department’s application developers to handle the integration to the RabbitMQ cluster, only after you've approved and given them the green light to do so. Along with approving integration requests, you also provide general conventions which you’ve adopted from prior experience. Part of the conventions you enforce is that the connecting microservices must create their own dedicated queues on integration to the cluster, as the best approach to isolating services and easily managing them. Unless of course, the microservices would be seeking to only consume messages from already existing queues.
So, average message rate across your cluster is almost stable at 1k/s, both from internal traffic, and external traffic which is being generated by some mobile apps publicised by the organisation. Everything is smooth sailing, till you get to a point where you realise that the total number of queues in your cluster is nearing the order of thousands, and one of the three servers seems to be over burdened, using more system resources than rest. Memory utilisation on that server starts reaching alarming thresholds. At this point, you realise that things can only get worse, yet you still have more pending requests for integration of more microservices onto the cluster, but can't approve them without figuring out how to solve the growing imbalance in system resources across your deployment.
Fig 1. RabbitMQ cluster imbalance illustration
After digging up on some RabbitMQ documentation, you come to light with the fact that since you're using HA queues, which you’ve adopted to enforce availability of service, all your message operations only reference your master queues. Microservices have been creating queues on certain nodes at will, implying that the provisioning of queues has been random and unstructured across the cluster. Concentration of HA queue masters on one node significantly surpass that on the other nodes, and as a result, with all message consumptions referencing master queues only, the server with the most queue masters is feeling the operational burden in comparison to the rest.
Your load balancer hasn't been of much help, since whenever you experience a network partition, or purposefully make one of your nodes unavailable for maintenance work, queue provisioning has proceeded uncontrolled on the remaining running nodes. This retains the queue count imbalance upon cluster restoration. A possible immediate solution would be to purge some of the queues, to relieve the memory footprint on the burdened server(s), but you can't afford to do this as most queued up messages are crucial to all business operations transacting through the cluster. New and existing microservices also can't continue timelessly creating and adding more queues into the cluster until this problem has been addressed. So what do you do?
Well, as of version 3.6.0, RabbitMQ has introduced a mechanism to grant its users more control in determining a queue master's location in a cluster, on creation. This is based on some predefined rules and strategies, configured prior to the queue declaration operations. If you can relate with the situation above, or would like to plan ahead and make necessary amendments to your RabbitMQ installation before encountering similar problems, then read on, and give this feature a go.
So how does it work?
Prior to introducing the queue master location mechanism, declaration of queues, by default, had been characterized by the queue master being located on the local node on which the declare operation was being executed on. This is somewhat very limiting, and has been the main reason behind the inefficient imbalance of system resources on a RabbitMQ cluster when the number of queues become significantly large.
Upon introducing this mechanism, the node on which the queue master will be located is now first computed from a configurable strategy, prior to the queue being created.
Configurable strategy is key here, as it leverages full control to RabbitMQ users to dictate the distribution of queue masters across their cluster. There are three means by which a queue master location strategy may be configured;
Queue declare arguments: This is at AMQP level, where the queue master location strategy is defined as part of the queue's declaration arguments
Policy: Here the strategy is defined as a RabbitMQ policy.
Configuration file: Location strategy is defined in the rabbitmq.config file.
Once set, the internal execution order of declaring a queue would be as follows;
Fig 2. Queue master location execution flow
These are the three ways in which a queue master location strategy may be configured, and how the execution flow is ordered upon queue declaration. Next, you may be asking yourself the following question;
What are these strategies anyway?
Queue master location strategies are basically the rules which govern the selection of the node on which the queue master will reside, on declaration. If you’re from an Erlang background, you’d understand when I say these strategies are nothing but callback modules of a certain behaviour pattern in RabbitMQ known as the rabbit_queue_master_locator. If you aren’t from an Erlang background, no worries, all you need to know is what strategies are available to you, and how to make use of them. Currently, there are three queue master location strategies available;
Min-Masters: Selects the master node as the one with the least running master queues. Configured as min-masters.
Client-local: Like previous default node selection policy, this strategy selects the queue master node as the local node on which the queue is being declared. Configured as client-local.
Random: Selects the queue master node based on random selection. Configured as random.
So in a nutshell, this is the general theory behind controlling and dictating the location of a queue master’s node. Syntax rules differ for each case, depending on whether the strategy is defined as part of the queue’s declare arguments, as a policy, or as part of the rabbitmq.config file.
NOTE: When both, a queue master location strategy and HA nodes policy have been configured, a conflict could arise in the resulting queue master node. For instance, if one of the slave nodes defined by the HA nodes policy becomes the queue master node computed by the location strategy. In such a scenario, the HA nodes policy would always take precedence over the queue master location strategy.
With this knowledge at hand, the engineer in the situation mentioned above would simply enforce the use of the min-masters queue location strategy as part of the queue declaration arguments for all microservices connecting to the RabbitMQ cluster. Or even easier, he’d simply set the min-masters policy on the cluster nodes, using the match-all wildcard for the queue name match pattern. This would ensure that all newly created queues would be automatically distributed across the cluster until there’s a balance in the number of queue masters per node, and ultimately, a balance in the utilization of system resources across all three servers.
At the moment, only three location strategies have been implemented, namely; min-masters, client-local and random. More strategies are yet to be brewed up, and if you feel you'd like to contribute a rule by which the distribution of queues can be carried out to better improve the performance of a RabbitMQ cluster, please feel free to drop a comment. These will go through some rounds of review, and could possibly be implemented and included in near future releases of RabbitMQ.
Quick n' easy experiment
I'll illustrate how the queue master location strategy is put into effect with a simple experiment to carry out on your local machine. We're going make things easy by making most of the management UI, to avoid the whole AMQP setup procedures like opening of connections and channels, creating exchanges, and so forth.
Download and install a RabbitMQ package specific for your platform. If you're on a UNIX based OS, you can just quickly download and extract the generic unix package, and navigate to the sbin directory.
tar xvf rabbitmq-server-generic-unix-3.6.1.tar.xz
Create a 3-node cluster by executing the following commands;
With our policy is configured, we can now go on and create our MinMasterQueue.1 queue.
So to prove that the min-masters policy does work, we'll create our MinMasterQueue.1 queue on the node with themost number of queues, i.e. rabbit@hostname (9 queues). With the queue master location policy in effect, the setrabbit@hostname node should be overriden by the node computed by the policy, which has the least number of queues, i.e. rabbit_2@hostname. Let's proceed!
So as mentioned in step 8, let's create MinMasterQueue.1 on the node with the most queues, rabbit@hostname, as follows;
Fig 6. MinMasterQueue creation
Now the moment of truth; let's verify if the queue was created on the correct node;
Fig 7. Min-master queue
Or, from the command line by executing the following;
./rabbitmqctl list_queues -q name pid
which should yield the following as part of the full list of displayed queues;
The results are indeed correct. The home node of MinMasterQueue.1 is rightly the one which had the least number queue masters, i.e. rabbit_2@hostname.
You can repeatedly execute step 9, creating more MinMasterQueue.N queues to see this queue master location strategy in effect. The home node of the queues created will interchange from one node to another, depending on the queue masters' count per node, at each given moment of execution.
Fig 8. Min-master queues
This is a quick illustration of this mechanism at work. In addition to setting a policy from the command line, as in step 8, there also other means of defining the queue master location strategy which I illustrate in the next section.
Following are some examples of how to configure queue master location strategies.
Firstly, to set the location strategy from the rabbitmq.config file, simply add the following configuration entry;
You can find complete versions of these snippets here. These are simplified primers which you can build upon. If you have a requirement to implement something more complex, and need some assistance, please don't hesitate to get in touch!