XMPP Protocol Use-Cases and Guide | Erlang Solution blog
by Piotr Nosek
Who will find this interesting
If you’re considering XMPP for your project but you are unsure if it can provide the functionality you need, you’ll eventually end up here:
I’m pretty sure you’ll be quite intimidated by such a long list of extensions. In some cases it will be pretty easy to find what you need. If you look for PubSub functionality, you’ll quickly notice “Publish-Subscribe”. Sometimes it’s not so obvious though. XMPP developers already know that in order to synchronise outgoing messages between several devices, they have to enable “Message Carbons”. Not very intuitive, isn’t it?
The aim of this blog post is to guide you towards proper XMPP technologies and solutions, given your specific use cases. I’ve worked with and deployed solutions powered by XMPP, such as MongooseIM, for years; so let me be your personal Professor Oak, providing a perfect “companion(s)” to work with and begin your journey in XMPP world. There are almost 400 XEPs, will you catch them all? ;)
The length of this article is caused not by a complexity of descriptions but by a count of use cases and features. :)
All numbers and information on implementation status are valid for March 2017.
What can you expect here?
For every use case, I will list XMPP features you definitely should consider using. Each one of them will be briefly described. The goal here is to understand the usefulness without reading whole specification. Besides that, each item will include MongooseIM’s module name providing discussed extension and example client implementations.
What you won’t find in this post
This post won’t cover any XMPP basics. It assumes you either know them already (what are JID, C2S, S2S, IQ, stanzas, stream etc.) or you intend to learn them from some other guide, like the excellent (iOS) tutorial written by Andres Canal Part 1, Part 2). It’s more of a cookbook, not Cooking For Dummies.
- I’m creating …
1.1 … a mobile application.
1.2 … a desktop application.
1.3 … a web application.
1.4 … an application that just can’t speak XMPP.
- I need my application to …
2.1 … show message status like Facebook does.
2.2 … provide message archive to end users.
2.2.1 I’d like to have a full text search feature.
- … display inbox (a list of conversations with unread count and a last message).
- … allow file transfers and media sharing between users.
4.2 File Upload
- … support groupchats …
5.1 … and I need precise presence tracking in each group.
5.2 … and I don’t need to broadcast presence information in each group.
- … be compatible with other public XMPP setups.
- … present the same view of each conversation on every user’s device.
- … allow users to block each other.
- … support end-to-end encryption.
- … be a part of Internet of Things.
- … receive push notifications.
- … publish messages to groups of subscribers.
1. Creating …
Before we proceed to more specific requirements, it’s important to identify crucial standards based on your application type.
1.1 … a mobile application.
Smartphones are omnipresent nowadays. It’s a fact. The whole software market considered, mobile apps are an important medium between various companies and their customers. Some of them are the actual products (games, communicators, car navigations, etc.), not only a “channel”. If you’re going to develop a mobile application, you will need…
XEP-0198 Stream Management
It’s an extension that provides two features actually. One of them is stanza delivery confirmation (both server and client side), what allows early detection of broken connections or malfunctioning network layer. The other one is stream resumption. It makes reconnection faster by reducing the round-trip count and relieves the client of fetching message archive as pending, unacknowledged messages will be retransmitted from server buffer.
It is enabled by default in MongooseIM and supported by major client libs like Smack or XMPPFramework. From a client developer perspective, it’s pretty transparent because the whole extension is enabled with a single flag or method call.
MUC Light, MIX, Push notifications, HTTP File Upload
These extensions are especially useful in the mobile environment. Why? With MUC Light and MIX you gain control over presence broadcasting - you can spare your users frequent radio wakeups and bandwidth usage. These extensions are a significant improvement over traditional presence-driven group chats.
Virtually every app on our smartphones uses push notifications. Some are useful and some are just annoying commercials. It doesn’t matter - it’s almost certain you’ll want them integrated with your XMPP service.
HTTP File Upload allows asynchronous media sharing, which is much more convenient in the case of group chats and doesn’t require both parties to stay online during the transfer.
These are just brief summaries. You can find more details further in this post.
1.2. … a desktop application.
Despite mobile phones’ expansion and software products exclusive for them (Instagram, Snapchat, Tinder, etc.), nobody can deny the comfort of UI operated with a mouse, keyboard, or tablet. Some apps simply require processing power that portable devices can’t provide. If your code is going to be executed on desktops PCs and laptops, you’ll appreciate…
There are no extensions that are strictly essential for desktop apps. Everything depends on specific applications. Just bear in mind that the standards important for mobile apps are generally useful for desktop ones too, only less critical.
1.3. … a web application.
As the days of heavy browser incompatibility (thank you, standardisation!) and Flash technology abuse are long gone, web applications are a great way to provide cross-platform solutions. It’s not only easier to reach more platforms but also to ensure the users are always running the most up-to-date version.
If you’re a web developer, you’re going to connect to the XMPP server via BOSH or Websockets.
Websockets technology allow to upgrade an HTTP connection to an asynchronous, full-duplex, binary one (a bit of simplification but it’s the essence). It means that XMPP stanzas can be exchanged almost as efficiently as over a raw TCP connection (Websockets add small overhead of integer with packet size). It’s the recommended protocol for single-page apps.
Note: You can combine Stream Management’s resumption with Websockets, although it will still be slower than BOSH’s session pause.
Warning: Websockets are not implemented by old browsers. If you have to support any outdated clients, take a look at this table first.
Defined in XEP-0124: Bidirectional-streams Over Synchronous HTTP (BOSH) and XEP-0206: XMPP Over BOSH. This protocol encapsulates XMPP stanzas in HTTP requests. It also simulates asynchronous, bidirectional communication by issuing long polling requests from client to the server to retrieve live data. What does it mean in practical terms?
The protocol is pretty verbose though, so if you don’t need this feature, go for Websockets.
1.4. … an application that just can’t speak XMPP.
You probably think that I’m crazy; why use XMPP with XMPP-less clients? Let’s change the way we think about XMPP for a moment. Stop considering XML the only input data format the XMPP server accepts. What if I told you that it’s possible to restrict XML to the server’s routing core and just make REST calls from any application? Tempting?
2. I need my application to …
Now we continue to more specific use cases.
2.1. … show message status like Facebook does.
By message status we mean following states (plus live notifications):
- Not sent to server yet.
- Acknowledged by the server.
- Delivered to the recipient.
- Displayed by the recipient.
- User is composing a message.
- User has stopped composing a message.
(1) and (2) are handled by Stream Management. It’s pretty obvious - before receiving an ack from the server, you are in (1); and ack confirms the message entered state (2).
We can deal with (3) and (4) by using XEP-0333: Chat Markers. These are special stanzas sent by a recipient to the original sender. There are dedicated markers for received and displayed events.
(5) and (6) are provided by XEP-0085: Chat State Notifications. It is up to a client to send updates like
<paused/> to the interlocutor.
2.2. … provide message archive to end users.
Virtually every modern chat application maintains conversation history both for 1-1 communication and group chats. It can remind you of a promise you’ve made, be evidence in a divorce case, or help in police investigation.
XMPP provides two protocols for accessing message archives. The older one, XEP-0136 Message Archiving is used by hardly anyone, because it’s difficult to implement and overloaded with features. It has been superseded by more modern XEP-0313 Message Archive Management, which is the current standard.
There is one caveat though - its syntax changed significantly between versions, so it’s common for libraries and servers to explicitly state what versions are supported by the specific piece of software. These are 0.2, 0.3 and 0.4(.1) and 0.5. MongooseIM supports all of them in
mod_mam module. If you choose another server, make sure its MAM implementation is compatible with your client library. Smack and XMPPFramework use 0.4 syntax.
2.2.1. I’d like to have a full text search feature.
Although standard Message Archive Management doesn’t specify any queries for full text search, it remains flexible enough to create such requests on top of the existing ones.
In MongooseIM this feature is still in experimental phase and has been recently merged into master branch. It’s not supported in any client library yet, so you have to construct a custom MAM query to do full text searches. Take a look at the PR description, It’s not that difficult. :)
2.3. … display inbox (a list of conversations with unread count and a last message).
Unfortunately there are no open solutions providing this feature. XMPP community is in the process of discussing and creating the specification of Inbox functionality. Erlang Solutions is designing a XEP proposal, which you can view here.
A quasi-inbox is available as a part of experimental standard Bind 2.0. It doesn’t cover all possible use-cases but a list of unread messages is what you actually need for optimal UX after establishing a connection. This feature is already under development in MongooseIM project.
In the meantime, you can build an inbox view by persisting last known archived message ID or timestamp and query Message Archive Management for all messages that came later. When you fetch them all, you can build an inbox. Unfortunately this is not very efficient and that’s why the community needs a new standard.
2.4. … allow file transfers and media sharing between users.
Almost everyone loves to share cat pictures and every modern IM solution provides means to do this. Various file transfer techniques in the XMPP world can be grouped in two categories: P2P connections and file upload.
The former involves establishing a direct connection between two clients, sometimes with a bit of a help from a TURN server. It ensures that data won’t get stored on any intermediate servers. Obviously, it requires less effort from the service provider because it’s easier and cheaper to set up a TURN service than to maintain a proper media server (or pay for storage in the cloud).
File upload is much more efficient when sharing media with a group. It doesn’t require both parties to remain online for the transfer duration.
Now, you DO have a choice here. There are a couple of XEPs, describing various P2P transfer initiation methods. XEP-0047 In-Band Bytestreams (IBB) is guaranteed to work in every network, because it sends data (Base64-encoded) via IQs. So if you can reach the XMPP service, you can transfer files. It may be slow and not very convenient but it will work.
Let’s carry on. You can transfer media via bytestreams external to XMPP. The P2P session is negotiated via XMPP but it’s only the “signalling” part. There are quite a few XEPs describing various negotiation and transmission protocols, so I will highlight specific implementations rather than listing all of the names which would only confuse readers who just want to send some bytes.
- XMPPFramework: Look for
XMPPOutgoingFileTransfer. They support SOCKS5 and In-Band Bytestreams.
- Smack: Everything begins with
FileTransferManager. It supports SOCKS5 and In-Band Bytestreams as well.
2.4.2. File Upload
Unless you already have a dedicated media server that exposes an API to perform uploads and downloads, you should definitely take a look at XEP-0363 File Upload. It defines standard stanzas to request upload slots and respective download links. It is XMPP server’s responsibility to allocate the slots and return the links to the client.
Unfortunately this extension is not widely supported yet. You can find it in XMPPFramework but not in Smack yet. In the case of MongooseIM, it’s already available with Amazon S3 backend (with more storage plugins to come!).
2.5. … support group chats …
A couple of years ago it was really simple - there was only one kind of group chat supported in the XMPP world. Today we have three standards, two of them being maintained by XSF and one published by Erlang Solutions. MIX (XEP-0369), doesn’t have any implementations yet and as a standard it changes very frequently, so it is not described in this post.
2.5.1. … and I need precise presence tracking in each group.
If you need IRC-like experience where users have certain roles in a room and client disconnection triggers leaving the room, then classic XEP-0045 Multi-User Chat will work for you. It has its disadvantages (frequent presence broadcast may impact UX and consume processing power or connection throughput) but fits the use case, where accurate presence information is important. It is provided by MongooseIM’s
mod_muc (other major servers implement it as well) and is supported by all mainstream client libs.
2.5.2. … and I don’t need to broadcast presence information in each group.
Erlang Solutions’ Multi-User Chat Light is a protocol derived from real world use cases, where groups doesn’t care about presences and full member list is always available to room members. It has some strong assumptions (like only 2 types of affiliation or rooms being joinable only by invite) but is designed to reduce round-trips, expose powerful API (e.g. room creation + configuration + adding new members in one request) and be easy to work with. Check it out and see if it fits in your application. Server implementation is currently exclusive to MongooseIM (
mod_muc_light) and respective plugins are available in Smack and XMPPFramework.
2.6. … be compatible with other public XMPP setups.
Even some proprietary installations do integrate with open XMPP world (like GTalk and Facebook at some point), so if this is your use case as well, the first important thing to remember is that no custom stanzas may leave your cluster. By custom I mean anything that is not covered by any XSF-approved XEP. Additionally, you will really benefit from using XEP-0030 Service Discovery protocol a lot, because you can never be sure what is the supported feature set on the other end. It is used to query both clients and servers. Virtually every client and server supports it. In case of MongooseIM, the base module is
2.7. … present the same view of each conversation on every user’s device.
I use Facebook messenger on multiple devices and I really expect it to display the same shopping list I got from my wife on both my desktop and my mobile phone. It usually breaks message order but anyway - at least the list is there.
The problem is actually a bit more complex, because you have to take care of synchronising both online and offline devices.
Online devices can ask the server to forward all incoming/outgoing messages, even if they originate from or are addressed to some other resource of the same user. It is achieved by enabling XEP-0280 Message Carbons. On the client side it’s easy - just enable the feature after authenticating and the server will do the rest. It’s supported by MongooseIM in mod_carboncopy module. You can find respective implementations in Smack, XMPPFramework, Stanza.io and many others, since it’s a very simple, yet powerful extension.
If you want to fetch everything that happened while a specific device was offline for a while, just query XEP-0313 Message Archive Management (see “… provide message archive to end users.” section).
2.8. … allow users to block each other.
You just can’t stand your neighbour nagging you via IM to turn down the volume while Kirk Hammett is performing his great solo? Block him. Now. XMPP can help you with it. In two ways actually.
Yes, XMPP features two standards that deal with blocking: XEP-0016 Privacy Lists and the simpler XEP-0191 Blocking Command. The former allows users to create pretty precise privacy rules, like “don’t send outgoing presences to JID X” or “accept IQs only from JIDs in my roster”. If you need such a fine grained control, take a look at MongooseIM’s
mod_privacy. On the client side it is supported by the likes of Smack and XMPPFramework.
Blocking Command is much simpler but most setups will find it sufficient. When a client blocks a JID, no stanza will be routed from the blockee to the blocker. Period. MongooseIM (
mod_blocking), Smack and XMPPFramework have it.
2.9. … support end-to-end encryption.
When Alice wants to send a message to Bob… no, we’ve all probably seen this classic example too many times already. :)
There is no “one size fits all” when it comes to E2E encryption. The first tradeoff you’ll have to make is to decide whether you want new users devices to be able to decrypt old messages, or do you prefer to have a property of forward secrecy. For a full comparison between available encryption methods, let’s take a look at the table published by OMEMO authors:
|Legacy Open PGP||Open PGP||OTR||OMEMO|
|Server Side Archive||Yes||Yes||No||No|
|Per Message Overhead||High||High||Low||Medium|
It’s difficult to find an open library that supports any of these methods. Gajim communicator has an OMEMO plugin. Smack and XMPPFramework don’t support E2E encryption in their upstream versions. If you’re going to use E2E encryption in your application, most probably you’ll have to implement it on your own. Good thing is there are standards you can base your code on.
2.10. … be a part of Internet of Things.
We are a peculiar bunch. We use semiconductors to build machines that do heavy number crunching for us, deliver messages in a blink of an eye and control robotic arms with precision far beyond ours. A desire has awoken in us to go even deeper and augment everything with electronics. To let everything communicate with each other.
If you’re designing a fridge microcontroller that is supposed to fetch results from bathroom scales and lock the door for 8h for every excessive BMI point, you’ll need…
- XEP-0323 Internet of Things - Sensor Data
- XEP-0324 Internet of Things - Provisioning
- XEP-0325 Internet of Things - Control
- XEP-0326 Internet of Things - Concentrators
- XEP-0347 Internet of Things - Discovery
Unfortunately there are no public implementations of these standards. I wish it was easier but it seems you just can’t avoid reading these XEPs, picking the most suitable parts and creating your own implementation.
To find out more and become an active member of XMPP IoT community, check out IoT Special Interest Group.
2.11. … receive push notifications.
Push Notifications are (usually) doing a great service to mobile devices’ battery life. It is great indeed that a single TCP connection is maintained by OS, while apps can remain hibernated in the background. It is natural to every chat application to deliver notifications to the end user, even when a smartphone is resting in the pocket. How does XMPP cooperate with popular services like APNS or GCM?
Although it’s not difficult to find XEP-0357 Push Notifications, it deserves some explanation. This specification is very generic. It assumes the existence of another XMPP-enabled “App server” that handles push notifications further. Although implementations could be found (e.g. in MongooseIM or Prosody server), it is very common for commercial installations to use custom protocols to provide push tokens and send PN packets directly to the respective services (APNS, GCM, SNS…)
2.12. … publish messages to groups of subscribers.
Publish-subscribe is an ancient and extremely useful pattern. XMPP got its own PubSub specification quite early (first versions were published in 2003) and even though the protocol is pretty verbose (and a bit complicated), for the basic usage you’ll need to learn only the most important aspects: there are nodes in the PubSub service where publishers can publish data. Nodes can group other nodes or remain plain leaves. That’s the whole story. The rest is about configuration, access control, etc.
XEP-0060 Publish-Subscribe is implemented in every major XMPP piece of software. In case of MongooseIM, it’s handled by
mod_pubsub. You can find in popular client libraries as well: Smack, XMPPFramework or Stanza.io.
Now, if you feel brave enough, you can dive into this looong, official list of XEPs. These documents are designed to provide a precise information for libs and server developers. In your daily routine you’ll probably won’t care about every server-side edge case or whether some information is encapsulated in element X or Y. These are perfect reference guides for if you’re stuck somewhere or need to tap into lib’s plugin internals.
I’d recommend one more intermediate step though. Browse servers’ and clients’ feature lists published by virtually every project on their webpages. They usually skip (or enumerate them in separate, more detailed docs) minor items and highlight the features that may be important to you. This way you’ll expand your XMPP vocabulary and knowledge, while postponing the stage where reading XEPs is unavoidable, thus making the learning curve less steep.
It won’t be long until you realise that XMPP is your friend, I promise. :)
Stay tuned for Part 2! In the meantime:
- Brush up on your XMPP basics with our guide to building an iOS app from scratch using XMPPFramework (parts 1 and 2)
- Learn more about MongooseIM, our XMPP based open source mobile messaging platform.