RabbitMQ, the message broker, is not the first thing that comes to mind when we talk about integrating legacy services. We use high level frameworks to hide the complexity of the underlying plumbing. In this post we will show how the RabbitMQ, the most widely deployed open source message broker, can help with reliable and resilient messaging.
What is RabbitMQ?
RabbitMQ is a free, open-source and extensible message queuing solution. It is a message broker that understands AMQP (Advanced Message Queuing Protocol), but can also be used with other popular messaging solutions like MQTT. It is highly available, fault tolerant and scalable. It is implemented in Erlang OTP, a technology tailored for building stable, reliable, fault tolerant and highly scalable systems which possess native capabilities of handling very large numbers of concurrent operations. The benefits of Erlang OTP can easily be seen in RabbitMQ and other systems like WhatsApp, MongooseIM, to mention a few.
At a very high level, it is a middleware layer that enables different services in your application to communicate with each other without worrying about message loss while providing different quality of service (QoS) requirements. It also enables fine-grained and efficient message routing enabling extensive decoupling of applications.
RabbitMQ Connecting Legacy Services
When designing system interactions we usually start with a top-to-bottom approach. We identify the components and use the technology available to connect them. If we need to integrate existing systems with newer ones, then it is often infeasible to change the older system to use modern protocols. In cases like this, connector applications can be developed acting as a bridge between the old system and the new.
Once we know the components and we have all the translation layers, we can deal with the interconnection. There are two major ways how parts of the system can communicate:
- Directly (synchronously). When both components are online, they can use an active network connection and a preferred protocol like HTTP, protobuf or even FTP. Using these technologies results in a simple solution with its trade-offs, such as unavailability of multiple services, or having to implement client side buffering, when things go wrong.
- Indirectly (asynchronously). To mitigate the trade-offs from the direct communication we can introduce a messaging layer into the solution that can retain the messages even if the receiving application is not online. Although this adds complexity to the architecture it removes a lot of edge cases from the applications. Scaling the messaging layer is often easier than introducing scale to the legacy applications.
RabbitMQ is a well-suited message broker because it scales well not only with the message load requirements, but also with the business requirements. Finding message brokers that can handle hundreds or thousands or even tens of thousands of messages is easy. But finding a message bus that can cater for several messaging patterns, can handle different protocols (AMQP, MQTT, STOMP, JMS and many more) and can be modified via plugins is much more difficult.
Features come with trade-offs, but when RabbitMQ is put into a heterogeneous environment, it provides an easy way to move the messages between virtually any systems. All major frameworks support an AMQP adapter, but even if the component uses a proprietary protocol then it is possible to develop a plugin and extend RabbitMQ’s functionality.
Resilient and Reliable Messaging
RabbitMQ provides a high degree of availability and consistent messaging if the following three features are used. First we see why it is recommended to run RabbitMQ as a cluster, then how to guarantee that messages are received and kept safe by RabbitMQ and how quorum queues improve on classic mirroring.
Although a single node RabbitMQ provides the highest consistency guarantees, it is not resilient to hardware errors or planned downtimes due to upgrades. If anything happens to the single RabbitMQ instance it means that the service is no longer available system-wide. If any of the message store files gets corrupted on the file system, then messages can be lost.
To avoid these problems RabbitMQ can run as a cluster, each node providing the full RabbitMQ functionality. It does not matter for the clients which RabbitMQ node it connects, RabbitMQ hides the complexities of distributing and replicating messages, queues and exchanges.
The big question is how many nodes we should use in a cluster. Using two nodes might be tempting, but it only works with systems running in an active-passive nature. RabbitMQ, as we discussed above, runs in an active-active fashion where each node in a cluster provides identical functionality. This means that in case of any problems, RabbitMQ needs a consensus of the majority of the nodes to function. By restricting the number of nodes in a cluster to an odd number, we can guarantee that the cluster will consistently survive the loss of the minority of the nodes (in case of 5 nodes, RabbitMQ can tolerate the loss of 2 nodes.)
To enforce that the minority nodes don’t commit changes, pausing them when detecting a network issue can be automated by setting the cluster partition handling mechanism to pause_minority. This is a built-in feature and it not only detects that a network partition occurred but also detects when the network is working again and can restart the nodes paused automatically. RabbitMQ does not require manual intervention to heal.
Publisher confirm and acknowledgements
RabbitMQ can automagically cluster and heal itself, but due to the distributed nature of things, it requires some cooperation from the clients to guarantee no message loss. When publishing a message to RabbitMQ the clients can only “forget” about the message when it was acknowledged by RabbitMQ. This acknowledgement on the publishing side is called the Publisher Confirm feature. It is more lightweight than the transaction mechanism in AMQP and performs much better under load. This RabbitMQ specific feature is supported by all major client libraries.
On the consuming side, the clients should acknowledge the message only when they finished processing the message. This guarantees that once a message is fully processed it is removed from RabbitMQ, but if any problem happens during message processing, the same message can be redelivered to the same or a different client. This gives a good opportunity to introduce horizontal scaling for parts of the system.
The last bit of the puzzle is to make sure that the messages that are in RabbitMQ can survive network errors, hardware failures or planned maintenance. With the introduction of clustering it is possible to distribute the workload among many nodes, but it is also important for introducing resiliency. Quorum queues are the new addition to the RabbitMQ feature family providing an improved solution to queue mirroring. It is based on the state of the art Raft algorithm to manage the integrity of the queue leader and the contents of the queues. Using it together with the publisher confirms, it is guaranteed that messages acknowledged by RabbitMQ will be safe until they are delivered to consumers.
Quorum queues improve on one of the major issues of classic queue mirroring: queue synchronisation after a network partition. With the Raft protocol, it is possible to only copy the messages that are missing from the mirror. This reduces the synchronisation network pressure which was a cause for some hard-to-debug RabbitMQ issues in the past.
In this blog post, we discussed the major challenges of integrating legacy and non-legacy systems. Introducing a versatile messaging layer like RabbitMQ can provide solutions to lots of problems and applying the three major features highlighted in this blogpost will enable a successful project.
Here at Erlang Solutions, we have world leading experts in RabbitMQ and can help with finding solutions to the unique problems of distributed systems. These challenges range from auditing the planned or deployed RabbitMQ solution or designing bespoke solutions with RabbitMQ. Get in touch to find out more about how we can help with your integration project or get your ticket to RabbitMQ Summit 2021 to learn more about why RabbitMQ is trusted by many of the world’s biggest companies.