Erlang

Rabbit’s Anatomy - Understanding Topic Exchanges

by Bartosz Szafran

Topic Exchange Intro/Overview

In RabbitMQ Exchange is the abstraction, where messages are published to. There are a few types of Exchanges, where Topic Exchange is one of them. Topic Exchange provides most flexible routing mechanisms. It depends on the Binding Key, which is provided when queue is bound to a Topic Exchange. It is kind of a pattern, which is used to make decision on routing messages by checking if a message’s Routing Key matches the pattern.

A Routing Key is made up of words separated by dots, e.g.: floor_1.bedroom.temperature. The Binding Key for Topic Exchanges is similar to Routing Key, however there are two special characters, namely asterisk * and hash #. These are wildcards, allowing to create a binding in a smarter way. Asterisk * matches any single word and # matches zero or more words. Here are examples of patterns to match on messages from:

  • - all devices on first floor in bedroom: floor_1.bedroom.*,
  • - all devices on first floor floor_1.#.

It is clearly visible Topic Exchange allows to greatly simplify the routing, allowing to match only.

Trie

To understand how a Topic Exchange works, the trie data structure has to be introduced. It is a tree, which holds ordered data. Typically it is used for storing string values. Each node in the tree represents a prefix of a string and holds links to child nodes, which share the same prefix. The child nodes are addressed with the next character following the prefix for the node.

Such a data structure makes searching for specific strings independent of the structure’s size. The characters of string are used to traverse the tree while searching it. The trie is used for storing Binding Keys in Topic Exchange. A Binding Key is split by dots and all string parts are used as pointers to the next node. So, each time when new binding is added to the Topic Exchange, the trie associated with it is updated. And each time when new message has to be routed, the trie is queried to look for the message’s destinations.

Implementation of the trie

Topic Exchange trie is implemented on the top of Mnesia. Nodes and edges of trie are stored in respectively rabbit_topic_trie_node and rabbit_topic_trie_edge tables.

#topic_trie_node{
    trie_node = #trie_node{
        exchange_name,
        node_id
    },
    edge_count,
    binding_count
}
#topic_trie_edge{
    trie_edge = #trie_edge{
        exchange_name,
        node_id, % parent
        word
    },
    node_id % child
}

In this case, trie_node or trie_edge records are primary keys used to identify records. Both nodes and edges are assigned to one particular Topic Exchange by specifying exchange_name field in primary key. Nodes are also used to identify bindings, which should be used, they are not stored directly in the node’s table, but they can be easily obtained by node_id from rabbit_topic_trie_binding table. Edges store information about connections between parent and child nodes. Edges also contain the part of the Binding Key (word), which is used to traverse the tree. Therefore, traversing through the tree requires a sequence of Mnesia queries. Reading edges and nodes is done using dirty operations.

Topic Exchange internals

A Exchange type is created by implementing the rabbit_exchange behaviour. In the context of tries in Topic Exchange, two operations are interesting. Namely, it is add_binding/3 and route/2, where first implements adding new binding to the structure and the latter is used to determine target for routing.

Binding Operation

The arguments needed to create the binding are:

  • - source Exchange
  • - Binding Key
  • - destination

Every trie starts with the root node, representing the empty Binding Key. It actually makes sense, as empty string is a prefix for any string. First operation is pretty straightforward - Binding Key has to be split by dots . and it is stored in a list. For example the key “a.b.c” is transformed to [“a”, “b”, “c”]. Let’s call the list Words for later. It will be used for traversing the data structure. Then, recursively the tree is traversed down, starting with root as a current node.

  1. Repeat until Words list is empty. 1.1. Take the head part of Words list and query Mnesia for child matching it. 1.2. If node is found, use it as a new current node and go to 1.1 with the rest of Words list. Otherwise go to 2.
  2. Create child nodes using rest of the Words list.
  3. When Words list is exhausted, create a rabbit_topic_trie_binding for the current node. It signals that there are bindings associated with it.

Here is an example binding operation. Let’s assume there is an Topic Exchange with two existing bindings: floor_1.bedroom.temperature and floor_1.#. Thus, here is a trie structure:

Let’s add new binding with the Binding Key floor_1.bedroom.air_quality. First we split it by dots: [floor_1, bedroom, air_quality]. There are already keys floor_1 and bedroom, but the latter one is missing. Therefore new node has to be created. Then, the rest of the key [air_quality] is used to create nodes. Finally new binding is associated with newly created node and structure ends with following shape:

Summarizing, to insert new node, 3 read queries were executed to retrieve edge between last pairs of nodes: {root, floor1}, {floor_1, bedroom} and {bedroom, air_quality}. However, the latter edge was not present, so 2 write operations were executed: first which updates edge_count for bedroom node and second which inserts new edge. At this point the trie structure is ready to create final node. Therefore another two write operations happen:

  • - creating actual node, which corresponds to the given Binding Key,
  • - creating entry in rabbit_topic_trie_binding, which bounds node with destination for messages.

The Mnesia tables used here are ordered_set type, which means that it is implemented with binary search tree. Thus, both read and write operation has complexity O(log(n)), where n is the size of the table. It can be observed, that first phase of traversing through the trie requires:

  1. read operation when, node exists
  2. write operation when node is not existing.

The final phase of inserting actual binding requires two extra operations. So, in worst case, there are no nodes and all of them need to be created. Then, the complexity is O(n*2*log(m) + 2*log(k)), where n is the length of the Binding Key, m is the number of nodes in the table and k is number of actual bindings. The m and k are global, so the efficiency of queries depends on global number of bindings/nodes, not only for given Exchange. For the simplicity it is assumed that the number of edges and nodes are equal, because in this structure (number of edges) = (number of nodes - 1).

Routing Operation

Routing happens, when a new message arrives to Topic Exchange. The trie structure needs to be queried using the Routing Key associated with the message. However the traversing through the trie is not straightforward, as wildcards * and # need to be taken in to account.

Same as in binding operation, at the beginning the Routing Key is split by dots. Again, let’s call it Words list [ Word | RestW ] = Words. The process starts with root node. Then the algorithm of discovering bindings is recursive exploration of the tree in three ways:

  • - Look for # in child nodes. If node is found, it is considered as new root and new scans are started with all remaining parts of Words, e.g: if Words is [a,b,c], then start searching again with [a,b,c], [b,c], [c] and [].
  • - Look for * in childs nodes. If node is found, continue with found node as a root and RestW.
  • - Look for Word in child nodes. If node is found, continue with found node as a root and RestW.

Exploring for all cases is finished when Words is exhausted. At the end there is one more step, which is looking for extra child nodes connected through hash #. It has to be made, because # wildcard stands for “zero or more”. So here is example of the searching algorithm. Let the Topic Exchange have the following bindings: floor_1.*.air_quality, floor_1.bedroom.air_quality, floor_1.bathroom.temperature. Let’s examine the routing for message published with the Routing Key floor_1.bedroom.air_quality, which is going to match all bindings. Here is trie representation, where current node is marked as blue and number on the node represents number of bindings.

First step is to find if there is hash # child node, but it is not present. Then, asterisk * child node is queried, but it is also not present. Finally algorithm finds node, matching the head of Words list - floor_1:

Now algorithm considered blue node as a new root and starting again. Again, there is no hash # child, but asterisk is found. Then head of the Words list is consumed and algorithm moves down:

Here there is only one option available - air_quality:

Words list is exhausted, so current node is result. There is one extra step made - hash # child node has to be queried again, because it also accepts empty lists. However it is not found, so only the current blue node is considered to be a result. Let’s mark found node with a green and get back to previous node:

Node was found using asterisk, but there is one step left. It has to be checked if there is bedroom child node. And actually there is one:

There is one word left and the child node is present:

Words list is empty again, so current is result:

The final step is to query for bindings associated with found bindings. According to the numbers on the found nodes, there are two bindings. They are final result of route/2 function.

Now let’s consider another example, with hash bindings present. There are three bindings present: #.air_quality , floor_1.# and floor_1.bedroom.air_quality.#. Again the floor_1.bedroom.air_quality Routing Key will be used:

Now we have hash node found. Then, algorithm goes down to that node with all available Routing Key parts:

Let’s emphasise this again: current node was reached via # edge, so the algorithm visits current blue node 4 times with different Words lists. They are presented on the figure. One of the Words list is empty one [], so this node is also appended to the results. There is no floor_1 or bedroom edge going out of this node, but there is air_quality one. So algorithm goes to the leaf node using the third Words list. Then:

Current Words list is empty, so this node is also the result of the search. There are no hash child nodes, so the current branch is finished. Algorithm goes back to the root node:

The only option to go down is using head of Words list:

Again there is hash child node, so it needs to be visited with all tails of the Words list. Three times in this case:

One of the Words lists is empty, so current blue node is appended to the result list. As there are no child nodes, the algorithm goes back:

Now the algorithm will go down two times consuming all remaining words. Let’s jump directly to it:

The Words list is empty, so the current node is also part of the result. However there is also a hash # child node. According to the algorithm, if any node is considered as a result, also its child hash nodes are matched. So finally there are 5 nodes found:

The final step is to find bindings associated with found nodes. The final results of route/2 function are all 3 bindings.

In term of complexity it quite hard to estimate it precisely. Algorithm does 3 queries for each node. The # nodes results in duplication of query path, as it starts the whole algorithm with all remaining parts of the Words. However all operations are depending on two factors - the Words list length and the total number of nodes existing in the table.

So assuming the worst case, where the bindings are: #, #.#, #.#.# ... k*#, we can see that each level will run with all possible combinations of Words, some of them will be visited many times with exactly the same Words. Then, the first node is visited n times, seconds is visited sum(1,n), third sum(1,sum(1,n)) and so on. We can rewrite it as:

The total number of operations is k1+k2+…+kk. When this recursive equation it unwrapped, every new level contains two times more multiplications than the previous one. The level k will contain 2k multiplications. It will be dominant in terms of complexity, so we can bound the complexity by O(2k*n*log(m)), where k is maximum trie depth, n is the length of the Words and m is the total number of nodes.

However, above example is extreme and bindings like #.#.# make no sense. Then, the average complexity would be close to O(nlog(m)), because it makes no sense to put two subsequent hash # parts of the key. The overhead introduced by single hash nodes should not be significant, because in such case the traversing trie with different Words stops after the hash edge.

Evaluation

This section will cover the performance of Topic Exchange. The experiments are going to capture the characteristics of Topic Exchange under different conditions.

Synthetic tests

Two experiments will be presented, in order to experimentally confirm of reject assumptions made in previous Routing Operation section:

  • First, the relation between the number of bindings and routing operation time. The bindings are fixed and the routing key length is adjusted. The linear dependency is expected.
  • Secondly, the routing key length is fixed and the number of bindings is varying.

Both experiments are performed under following conditions:

  • - Tests are made on single RabbitMQ node.
  • - Routing time is measured by checking time of evaluation rabbit_exchange_type_topic:route/2.
  • - Measurements are made 50 times and average results are presented on the figures.
  • - The Bindings Keys are random, ensuring that there are no two subsequent hashes in any Binding Key.
  • - The Binding Key part has 70% chance to be a word, 20% chance to be an asterisk and 10% to be a hash.
  • - The Routing Keys are created from existing Binding Keys - For example the Routing Key with length n, will be created from existing Binding Key with the length n. Any hashes or asterisks, are replaced by random strings. It ensures that operation must traverse through at least n levels of trie.

Above figure presents the results of three experiments. Now let’s slightly modify conditions, in order to visualize the impact of # hash key in Trie structure. There is only one binding added, which is just two subsequent hashes #.#. Then, the performance looks like:

The red curve bends, this was actually predicted. When there are more # bindings on the query path, the relation between Routing Key length and query time is no longer linear. This effect is also observable in 10k bindings series - the green curve also slightly bends. It can be explained in the same way - there are more Bindings Keys starting with #, this increases query time for all queries.

Let’s check it in RabbitMQ Management UI:

Actually there are around 50 such bindings. Let’s replace them, which should result in more linear relation and will give an overview of hash impact to performance:

Indeed the time of finding routes is improved. Now let’s examine the number of bindings impact on query time. As it was explained in previous section, the logarithmic relation is expected:

This example also follows the expected behaviour. All bindings are stored in a single Mnesia table. Querying for any node has its own complexity. While there are more entries in the table, the query time grows. As the table has ordered_set type, the query time has logarithmic complexity, what is actually observed.

Summing up, the previous experiments aligns to the theory from previous section. The expectations about impact of RK length and number of bindings to routing operation time was confirmed. The huge impact of hash # wildcard have been also confirmed and the scale of it was presented.

Real world example

Two previous examples were measuring time of single query. While it is still valuable, it does not give real-life experience. Test is synthetic and focuses on single query. But does Topic Exchange performance overhead is also observable when overall performance of RabbitMQ is taken into account?

This chapter will present performance evaluation of RabbitMQ integration to MongooseIM. MongooseIM is Erlang Solutions’ highly scalable, instant messaging server. The RabbitMQ component in MongooseIM simply reports each users’ activity, which may be:

  • - User became online/offline
  • - User sent/received message

Only sent/received messages will be discussed in this case. The Routing Key of the message activity follows simple pattern <username>.{sent, received}.

In order to evaluate the performance of the component, there was a load test designed. There were simulated XMPP clients connecting to the MongooseIM server. The simulated clients were exchanging messages with each other. Each message generates events, which are published to RabbitMQ. Then, there are a number of AMQP clients, connecting to RabbitMQ, to consume generated events.

This is the outline of the experiment’s architecture:

For the purpose of this post only results which are directly connected to Topic performance will be covered. Let’s define the key performance indicator as a Time To Delivery as an amount of time between message being sent by XMPP user and being received by Consumer for RabbitMQ’s queue. This value will be presented on future figures.

Tests conditions were as follows:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - topic exchange with Binding Keys like user_N.*
  • - 20k bindings in total

In this case performance is presented on the following figure. It shows the 95th, 99th percentiles of Time To Delivery, as well as maximum observed Time To Delivery in given time window.

The latter test had similar condition. The only difference was different Exchange type:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - direct exchange with Binding Keys like user_N.chat_msg_sent, user_N.chat_msg_recv
  • - 40k bindings in total

Under those condition performance was better, which is illustrated on following figure.

While previous section showed performance characteristics of Topic Exchange, those examples provides overview on a bigger scale. Both tests had identical characteristics apart from Topic type (and consequently number of bindings). However the difference in performance is significant in favor of the Direct Exchange. It allowed to effectively decrease the Time To Delivery, which is a factor of efficiency in case of the presented tests.

Summary

This post covered a few areas, which are the Topic Exchange internals, brief overview of its implementation, theoretical overhead introduced by traversing the trie structure, as well as some performance evaluation. As it was observed, Topic Exchange is not the fastest one and there are many factors which may influence its performance. However it is not true that Topic Exchanges are slow. In fact they are really fast in typical RabbitMQ usages. Test conditions were specific. If there are a few bindings or the trie depth is not deep, usually the Topic Exchange overhead is negligible. Still, it is important to understand the underlying mechanism, as example with MongooseIM’s RabbitMQ component presented - using different Exchange type resulted in much better performance.

  1. Get some expert help or training on RabbitMQ
  2. Check RabbitMQ and MongooseIM demo
  3. Find out more about MongooseIM
Go back to the blog

Tags: RabbitMQ
×

Thank you for your message

We sent you a confirmation email to let you know we received it. One of our colleagues will get in touch shortly.
Have a nice day!