Scaling a Mongoose: How scalable is the MongooseIM XMPP server?
by Bartlomiej Gorny
How does MongooseIM it scale?
When talking about servers, this question is asked over and over again, and, MongooseIM is no exception. How does it scale? It scales well, this we know. We’ve done a lot of load tests in various environments, we’ve set up multiple production clusters, some of them handling a substantial load. But, more specifically, how does it scale?
It is a difficult question and requires a careful insight (never say that in a TV interview, it is the worst answer you can possibly give). But, it really is. It depends on so many factors – underlying hardware, usage pattern, enabled extensions, database backend, integrations… We’d love to have a definite answer to the question, but is that even possible?
This is what my car spec says is its top speed is. Does that mean I can drive it that fast? First of all, no because I’m not going to try. Even if I did, I certainly won’t make it, for a variety of reasons – the car is not brand new, driving conditions are never perfect, I have regular tyres which are designed for security and not for speeding, etc. What the manufacturer is really saying is (a) you won’t go faster than that no matter what you do, and (b) something like 180km/h is absolutely fine (if legal).
So let’s follow this approach and find out what the “top speed” of MongooseIM is. Along the way, we’ll also examine how it behaves when we expand the hardware, both horizontally and vertically, what is limiting its growth, and some other interesting aspects.
We ran our tests on AWS - it is the most commonly used general-purpose cloud environment. Results from there can serve as a benchmark. A set of ansible scripts provisioned an EC2 instance, installed and configured MongooseIM. Then we launched the “client” EC2 instances one by one, each establishing a number of XMPP connections pretending to be real clients. We proceeded with it until either new connections started failing or the end to end time to delivery shot up, then we terminated the whole test.
The MongooseIM server was “plain vanilla” - the goal was to test just the main functionalities, like handling connections and routing messages. If a cluster of servers were used, clients would connect randomly to any of the nodes available - there was no load balancer to get in the way. Client behaviour was also rudimentary - a single “client” would establish a stream, authenticate and then keep the connections open, sending one short message every minute and receiving whatever came in. Messages contained timestamps, so that we could track end-to-end time-to-delivery.
After experimenting with a number of instance types, we found out that the c5 line is a perfect balance. With our assumed usage pattern, this hardware profile provides just the right combination of memory and CPU power. Memory-optimised instances of a similar size offers a similar performance, while being much more expensive - there is not much to gain by adding more memory. Also, results from running memory-optimised instances were unstable, since the CPU usage of MongooseIM had many spikes which may break the tests at any time. Here are some examples:
Our imaginary average user sends one message per minute - with 137k connections on c5.large instance, this means that MongooseIM is routing about 2.3k messages per second. What if our users become more active, or less active - how does it change the maximum load we can handle?
Less messages means the CPU has less work to do, but memory pressure stays the same. The expectation, therefore, is, that when the traffic is low, the maximum number of connections should not depend on the traffic. This is because, in that scenario, the memory is the limiting factor. On the other hand, if the load increases, at some point, it should overload the CPU and the maximum number of connections should start to fall.
Our tests confirmed this, and also proved that the default usage pattern of one message per minute is just about the breaking point. Below it the limit stays the same. Go slightly above it and the limit begins to fall.
Testing scalability on two dimensions
Now that we know that c5 line provides a perfect balance, how will using a more powerful variant help? Again, is it linear? Does doubling the digit before letter “x” actually double the load we can handle? Is there a limit to it?
This was tricky, because there are many OS-level limits and Erlang VM settings which can stop your node from accepting more connections. For details on how we configured our servers, consult MongooseIM documentation available online.
We ran our test on larger instance types, up to c5.9xlarge. At that level, MongooseIM was able to handle almost 2.5 million connections, passing 45 thousand messages per second. And no issues were found, so there doesn’t seem to be a real hard limit. However, since this was already far more than we were aiming for, and running these tests incurs a considerable expense, in terms of both time and money, we chose to stop here. This is not the end, though, chances are that some day we will try to push it even further.
How does MongooseIM scale horizontally? Is it linear, and if so, how many users each new node can handle? Can we scale ad infinitum, or is there a limit above which there are diminishing returns or even a failure? We wanted to find out, and it was by far the most time consuming part of the whole research.
We were assembling MongooseIM clusters on c5.large instances and stress-testing them until they failed. We went on and on, and scaling was almost linear, with the slope of the line fluctuating about 51 thousands (meaning each new node increased the cluster’s capability by 51 thousands connections, with intercept being about 80 thousands).
And so it went, until we reached fifteen nodes, at over 1.2 million users and 22 thousand messages per second, no limit was in sight. At this point we decided to jump ahead and try twenty nodes. This proved to be too much. Connections between the nodes were timing out every now and then, causing netsplits and inconsistencies in the Mnesia database, and the cluster was basically not functional.
The reason for this is the way Mnesia and, more generally, Erlang distribution works. Each clustered Erlang node maintains active connections to all the other nodes, so the number of connections grows at an increasing rate. With twenty nodes, there were already 190 connections to maintain and exchange data over.
User sessions in MongooseIM are kept in the Mnesia database, which is replicated to all the nodes to ensure strict consistency at all times. Every new connection then triggers a replication mechanism spanning all the nodes and possibly all the internode connections. With this number of nodes, traffic can become quite substantial, no wonder it becomes unstable.
Why not both?
Since we didn’t discover any limit in the vertical scalability, and horizontal clustering is linear (up to a reasonable number of nodes), then by combining the two, we could expect to be able to handle really large numbers - from what we know so far, ten million is well within reach. Taking costs into account, we decided to stop the load tests at this point. Exploring this path is one of the options for the future.
Last but not least, how much does running a MongooseIM cost, and how do the various clustering strategies work cost-wise? Because horizontal scaling is linear with a fixed part, running a smaller cluster is somewhat more cost-efficient than expanding the number of nodes.
Vertical scaling, in turn, is almost exactly proportional - the AWS price list is designed in such a way that an instance which is twice as expensive can handle almost exactly two times more client connections.
The rule of thumb for economic clusters would then be: make a cluster of four or five nodes, and scale vertically if needed. And here we have to reiterate that actual results, from a customised server with its particular usage pattern may vary widely. Keep in mind also that in real life, it is not only MongooseIM you have to scale. Database backends or other external services, which are normally shared by all the instances, have their own scaling patterns, and pricing.
Third dimension – federation
If the load is hard to handle for a single cluster, you can also go beyond that – there is no rule saying that all users have to use the same data centre. With federation, you can set up multiple data centres, each under their own domain, thus multiplying the possible load your whole system can handle. It’d also give you the extra benefit of bringing your servers closer to your geographically distributed uses. Federation also has its idiosyncrasies, but it is beyond the scope of this article (chances are we will get back to it in one of the future installments).