Erlang

MongooseIM 3.4: Designed with privacy in mind

by Mateusz Bartkowiak

Let’s face it. We are living in an age where all technology players gather and process huge piles of user data, starting from our behavioural patterns and finishing on our location data. Hence, we receive personalized emails from online retail stores we have visited for just a second or personalized ads of stores in our vicinity that are displayed in our social media streams.

Consequently, more and more people become aware of how their privacy could be at risk with all of that data collection. In turn, the European Union strived to fight for consumer rights by implementing privacy guidelines in the form of the General Data Protection Regulation (GDPR), which governs how consumer data can be handled by third parties. In fact, over 200,000 privacy violation cases have been filed during the course of the last year followed with over €56m in fines for data breaches. Therefore, the stakes are high for all messaging service providers out there.

You might wonder: “Why should this matter to me? After all, my company is not in Europe.” Well, if any of the users of your messaging service are located in the EU, you are affected by GDPR as if you would host your service right there. Feeling uneasy? Don’t worry, MongooseIM team has got you covered. Please welcome MongooseIM release 3.4 that brings us full GDPR compliance.

Privacy by Design

A new concept has been defined with the dawn of GDPR - privacy by default. It is assumed that the software solution being used follows the principles of minimising and limiting, hiding and protecting, separating, aggregating, as well as, providing privacy by default.

Minimise and limit

The minimise and limit principle regards the amount of personal data gathered by a service. The general principle here is to take only the bare minimum required for a service to run instead of saving unnecessary data just in case. If more data is taken out, the unnecessary part should be deleted. Luckily, MongooseIM is using only the bare minimum of personal data provided by the users and relies on the users themselves to provide more if they wish to - e.g. by filling out the roster information. Moreover, since it is implementing XMPP and is open source, everybody has an insight as to how the data is processed.

Hide and protect

The hide and protect principle refers to the fact that user data should not be made public and should be hidden from plain view disallowing third parties to identify users through personal data or its interrelation. We have tackled that by handling the creation of JIDs and having recommendations regarding log collection and archiving.

What is this all about? See, JIDs are the central and focal point of MongooseIM operation as they are user unique identifiers in the system. As long as the JID does not contain any personally identifiable information like a name or a telephone number, the JID is far more than pseudo-anonymous and cannot be linked to the individual it represents. This is why one should refrain from putting personally identifiable information in JIDs. For that reason, our 3.4 release includes a mechanism that allows automatic user creation with random JIDs that you can invoke by typing ‘register’ in the console. Specific JIDs are created by intentionally invoking a different command (register_identified).

Still, it is possible that MongooseIM logs contain personally identifiable information such as IP addresses that could correlate to JIDs. Even though the JID is anonymous, an IP address next to a JID might lead to the person behind it through correlation. That is why we recommend that installations with privacy in mind have their log level set to at least ‘warning’ level in order to avoid breaches of privacy while still maintaining log usability.

Separate and aggregate

The separate principle boils down to partitioning user data into chunks rather than keeping them in a monolithic DB. Each chunk should contain only the necessary private data for its own functioning. Such a separation creates issues when trying to identify a person through correlation as the data is scattered and isolated - hence the popularity of microservices. Since MongooseIM is an XMPP server written in Erlang, it is naturally partitioned into modules that have their own storage backends. In this way, private data is separated by default in MongooseIM and can be also handled individually - e.g. by deleting all the private data relating to one function.

The aggregation principle refers to the fact that all data should be processed in an aggregated manner and not in one focused on detailed personal cases. For instance, behavioural patterns should be representative of a concrete, not identifiable cohort rather than of a certain Rick Sanchez or Morty Smith. All the usage data being processed by MongooseIM is devoid of any personally identifiable traits and instead tracks metrics relevant to the health of the server. The same can be said for WombatOAM if you pair it with MongooseIM. Therefore, aggregation is supported by default.

Privacy by default

It is assumed that the user should be offered the highest degree of privacy by default. This is highly dependant on your own implementation of the service running on top of MongooseIM. However, if you follow our recommendations laid out in this post, you can be sure you implement it well on the backend side, as we do not differentiate between the levels of privacy being offered.

The Right of Access

According to GDPR, each user has the right of access to their own data that’s being kept by a service provider. That data includes not only personal data provided by the user but also all the derivate data generated by MongooseIM on its basis. That includes data held in mod_vcard, mod_roster, mod_mam, mod_offline, mod_pubsub, mod_private, mod_inbox, and logs. If we add a range of PubSub backends and MAM backends to the fray, one can see it gets complicated.

With MongooseIM 3.4 we have put a lot of effort in order to make the retrieval as painless as possible for system administrators that oversee the day to day operations. That is why we have developed a mechanism you can start by executing the retrieve_personal_data command in order to collect all the personal and derivative data belonging to a user behind a specific JID. The command will execute for all the modules no matter if they are enabled or disabled. Then, all the relevant data is extracted per module and is returned to the user in the form of an archive.

In order to facilitate the data collection, we have changed the schemas for all of our MAM backends. This has been done to allow a swift data extraction since up till now it was very inefficient and resource hungry to run such a query. Of course, we have prepared migration strategies for the affected backends.

The Right to be Forgotten

The right to be forgotten is another one that goes alongside the right of access. Each user has the right to remove their footprint from the service. Since we know retrieval from all the modules listed above is problematic, removal is even worse.

We have implemented a mechanism that removes the user account leaving behind only the JID. You can run it by executing the “unregister” command. All of the private data not shared with other users is deleted from the system. In contrast, all of the private data that is shared with other users - e.g. group chats messages or PubSub flat nodes - is left intact as the content is not only owned by one party.

Logs are not a part of this action. If the log levels are set at least to ‘warning’, there is no personal data that can be tied to the JIDs in the first place so there is no need for removal.

Final Words on GDPR

The elements above make MongooseIM fully compliant with the current GDPR. However, you have to remember that this is only a piece of the puzzle. Since MongooseIM is a backend to a service there are other considerations that have to be fulfilled in order for the entire service to be GDPR compliant. Some of these considerations include process-oriented requirements of informing, enforcing, controlling, and demonstrating that have to be taken into consideration during service design.

Changelog

Please feel free to read the detailed changelog. Here, you can find a full list of source code changes and useful links.

Test our work on MongooseIM 3.4 and share your feedback

Help us improve MongooseIM:

  1. Star our repo: esl/MongooseIM
  2. Report issues: esl/MongooseIM/issues
  3. Share your thoughts via Twitter
  4. Download Docker image with new release.
  5. Sign up to our dedicated mailing list to stay up to date about MongooseIM, messaging innovations and industry news.
  6. Check out our MongooseIM product page for more information on the MongooseIM platform.
Go back to the blog

×

Thank you for your message

We sent you a confirmation email to let you know we received it. One of our colleagues will get in touch shortly.
Have a nice day!