MongooseIM 6.1: Handle more traffic, consume less resources
- Pawel Chrzaszcz
- 10th May 2023
- 15 min of reading time
MongooseIM is a highly customisable instant messaging backend, that can handle millions of messages per minute, exchanged between millions of users from thousands of dynamically configurable XMPP domains. With the new release 6.1.0 it becomes even more cost-efficient, flexible and robust thanks to the new arm64 Docker containers and the C2S process rework.
Modern applications are often deployed in Docker containers. This solution simplifies deployment to cloud-based environments, such as Amazon Web Services (AWS) and Google Cloud. We believe this is a great choice for MongooseIM, and we also support Kubernetes by providing Helm Charts. Docker images are independent of the host operating system, but they need to be built for specific processor architectures. Amd64 (x86-64) CPUs have dominated the market for a long time, but recently arm64 (AArch64) has been taking over. Notable examples include the Apple Silicon and AWS Graviton processors. We made the decision to start publishing ARM-compatible Docker images with our latest 6.1.0 release.
To ensure top performance, we have been load-testing MongooseIM for many years using our own tools, such as amoc and amoc-arsenal-xmpp.
When we tested the latest Docker image on both amd64 and arm64 AWS EC2 instances, the results turned out to be much better than before – especially for arm64. The tested MongooseIM cluster consisted of two nodes, which is less than the recommended production size of three nodes. But the goal was to determine the maximum capability of a simple installation. Various compute-optimized instances were tested – including the 5th, 6th and 7th generations, all in the xlarge size. PostgreSQL (db.m6g.xlarge)
was used for persistent storage, and three Amoc nodes (m6g.xlarge
) were used for load generation. The three best-performing instance types were c6id
(Intel Xeon Scalable, amd64), c6gd
(AWS Graviton2, arm64) and c7g
(AWS Graviton3, arm64).
The two most important test scenarios were:
Several extensions were enabled to resemble a real-life use case. The most important are:
The first two extensions perform database write operations for each message, and disabling them would improve performance.
The results are summarized in the table below:
Node instance type (size: xlarge) | c6id | c6gd | c7g |
---|---|---|---|
One-to-one messages per minute per node | 240k | 240k | 300k |
Multi-user chat messages per minute per node | 120k sent 600k received | 120k sent 600k received | 150k sent 750k received |
On-demand AWS instance pricing per node per hour (USD) | 0.2016 | 0.1536 | 0.1445 |
Instance cost per billion delivered one-to-one chat messages (USD) | 14.00 | 10.67 | 8.03 |
Instance cost per billion delivered multi-user chat messages (USD) | 5.60 | 4.27 | 3.21 |
For each instance, the table shows the highest possible message rates achievable without performance degradation. The load was scaled up for the c7g instances thanks to their better performance, making it possible to handle 600k one-to-one messages per minute in the whole cluster, which is 300k messages per minute per node. Should you need more, you can scale horizontally or vertically, and further tests showed almost a linear increase of performance – of course there are limits (especially for the cluster size), but they are high. Maximum message rates for MUC Light were different because each message was routed to five recipients, making it possible to send up to 300k messages per minute, but deliver 1.5 million.
The results allowed calculating the costs of MongooseIM instances per 1 billion delivered messages, which are presented in the table above. Of course it might be difficult to reach these numbers in production environments because of the necessary margin for handling bursts of traffic, but during heavy load you can get close to these numbers. The database cost was actually higher than the cost of MongooseIM instances themselves.
We have completely reimplemented the handling of C2S (client-to-server) connections. Although the changes are mostly internal, you can benefit from them, even if you are not interested in the implementation details.
The first change is about accepting incoming connections – instead of custom listener processes, the Ranch 2.1 library is now used. This introduces some new options, e.g. max_connections
and reuse_port
.
Prior to version 6.1.0, each open C2S connection was handled by two Erlang processes – the receiver process was responsible for XML parsing, while the C2S process would handle the decoded XML elements. They are now integrated into one, which means that the footprint of each session is smaller, and there is less internal messaging.
The core XMPP operations are defined in RFC 6120, and we have reimplemented them from scratch in the new mongoose_c2s module. The most important benefit of this change from the user perspective is the vastly improved separation of concerns, making feature development much easier. A simplified version of the C2S state machine diagram is presented below. Error handling is omitted for simplicity. The “wait for session” state is optional, and you can disable it with the backwards_compatible_session configuration option.
A similar diagram for version 6.0 would be much more complicated, because the former implementation had parts of multiple extensions scattered around its code:
Functionality | Described in | Moved out to |
---|---|---|
Stream resumption | XEP-0198 Stream Management | mod_stream_management |
AMP event triggers | XEP-0079 Advanced Message Processing | mod_amp |
Stanza buffering for CSI | XEP-0352 Client State Indication | mod_csi |
Roster subscription handling | RFC 6121 Instant Messaging and Presence | mod_roster |
Presence tracking | RFC 6121 Instant Messaging and Presence | mod_presence |
Broadcasting PEP messages | XEP-0163 Personal Eventing Protocol | mod_pubsub |
Handling and using privacy lists | XEP-0016 Privacy Lists | mod_privacy |
Handling and using blocking commands | XEP-0191 Blocking Command | mod_blocking |
It is important to note that mod_presence
is the only new module in the list. Others have existed before, but parts of their code were in the C2S module. By disabling unnecessary extensions, you can gain performance. For example, by omitting [mod_presence]
from your configuration file you can skip all the server-side presence handling. Our load tests have shown that this could significantly reduce the total time needed to establish a connection. Moreover, disabling extensions is now 100% reliable and guarantees that no unwanted code would be executed.
If you are interested in developing your custom extensions, it is now easier than ever, because mongoose_c2s
uses the new C2S-related hooks and handlers and several new features of the gen_statem
behaviour. C2S Hooks can be divided into the following categories, depending on the events that trigger them:
Trigger | Hooks |
User session opening | user_open_session |
User sends an XML element | user_send_packet, user_send_xmlel, user_send_message, user_send_presence, user_send_iq |
User receives an XML element | user_receive_packet, user_receive_xmlel, user_receive_message, user_receive_presence, user_receive_iq, xmpp_presend_element |
User session closing | user_stop_request, user_socket_closed, user_socket_error, reroute_unacked_messages |
mongoose_c2s:call/3 | foreign_event |
Most of the hooks are triggered by XMPP traffic. The only exception is foreign_event
, which can be triggered by modules on demand, making it possible to execute code in context of a specific user’s C2S process.
Modules add handlers to selected hooks. Such a handler performs module-specific actions and returns an accumulator, which can contain special options, allowing the module to:
state_mod
, or replace the whole C2S state data with c2s_data
.c2s_state
.gen_statem
transition actions with actions.stop
) or forcefully (hard_stop
).route, flush
) or without triggering hooks (socket_send
).Let’s take a look at the handlers of the new mod_presence module. For user_send_presence
and user_receive_presence
hooks, it updates the module-specific state (state_mod
) storing the presence state. The handler for foreign_event
is more complicated, because it handles the following events:
Event | Handler logic | Trigger |
---|---|---|
{mod_presence, get_presence | get_subscribed} | Get user presence information / subscribed users | mongoose_c2s:call(Pid, mod_presence, get_presence | get_subscribed) |
{mod_presence, {set_presence, Presence}} | Set user presence information | mongoose_c2s:cast(Pid, mod_presence, {set_presence, Presence}) |
{mod_roster, RosterItem} | Update roster subscription state | mongoose_c2s:cast(Pid, mod_roster, RosterItem) |
The example shows how the coupling between extension modules remains loose and modules don’t call each other’s code directly.
The following new gen_statem features are used in mongoose_c2s:
Arbitrary term state – with the state_event_function
callback mode it is possible to use tuples for state names. An example is {wait_for_sasl_response, cyrsasl:sasl_state(), retries()}
, which has the state of the SASL authentication process and the number of authentication retries left encoded in the state tuple. Apart from the states shown in the diagram above, modules can introduce their own external states – they have the format {external, StateName}
. An example is mod_stream_management
, which causes transition to the {external, resume}
state when a session is closed.
Multiple callback modules – to handle an external state, the callback module has to be changed, e.g. mod_stream_management
uses the {push_callback_module, ?MODULE}
transition action to provide its own handle_event
function for the {external, resume}
state.
State timeouts – for all states before wait_for_session
, the session terminates after the configurable c2s_state_timeout
. The timeout tuple itself is {state_timeout, Timeout, state_timeout_termination}
.
Named timeouts – modules use these to trigger specific actions, e.g. mod_ping
uses several timeouts to schedule ping requests and to wait for responses. The timeout tuple has the format {{timeout, ping | ping_timeout | send_ping}, Interval, fun ping_c2s_handler/2}
. This feature is also used for traffic shaping to pause the state machine if the traffic volume exceeds the limit.
Self-generated events – this feature is used very often, for example when incoming XML data is parsed, an event {next_event, internal, XmlElement}
is generated for each parsed XML element. The route and flush options of the c2s accumulator generate internal events as well.
MongooseIM 6.1.0 is full of improvements on many levels – both on the outside, like the arm64 Docker images, and deep inside, like the separation of concerns in mongoose_c2s. What is common for all of them is that we have load-tested them extensively, making sure that our new messaging server delivers what it promises and the performance is better than ever. There are no unpleasant surprises hidden underneath. After all, it is open source, and you are welcome to download, deploy, use and extend it free of charge. However, should you have a special use case, high performance requirements or want to reduce costs. Don’t hesitate to contact us, and we will be able to help you deploy, load test and maintain your messaging solution.
Here's how machine learning drives business efficiency, from customer insights to fraud detection, powering smarter, faster decisions.
Phuong Van explores Phoenix LiveView implementation, covering data migration, UI development, and team collaboration from concept to production.
Learn how Erlang Solutions helped companies like TV4, FACEIT, and BET Software overcome tech challenges and achieve success.