RabbitMQ Summit talk recap: Developing RabbitMQ plugins in Elixir - Matteo Cafasso

RabbitMQ Summit 2018 was a one day conference which brought light to RabbitMQ from a number of angles. Among others, Matteo Cafasso tried to encourage more RabbitMQ plugin development in Elixir, from the community.

RabbitMQ comprises of a state of the art plugin architecture, allowing it to be highly customizable and extensible to meet various requirements which may not be supported by an "out-of-the-box" broker installation. This presentation gives an overview of RabbitMQ's plugin infrastructure and discusses some useful, existing plugins with an overview of developing custom RabbitMQ plugins in Elixir. With Elixir being a language fully compatible with the Erlang Virtual Machine and growing in popularity, the RabbitMQ core team also adopted it in developing the next generation CLI-tools found in the latest release series, 3.7.x; a strong indication on the direction and future of RabbitMQ's product development. Outcome of this presentation looks to encourage more RabbitMQ plugin development in Elixir from the community!

Slides from this lecture

Developing RabbitMQ plugins in Elixir

My name is Matteo Cafasso. And today I am going to talk about RabbitMQ plugins; what are they, how to use them and especially how to develop them. And I hope that at the end of this talk it will be much clearer how simple and fun it is to develop RabbitMQ plugins and how you can employ them to enhance the capability of our favorite broker, as well as our architectures.

Before we get started I just want to introduce myself. I am a software engineer working in F-Secure Corporation. There I am focusing on malware analysis automation and especially in behavioral or dynamic malware analysis. This is the process in malware analysis where we are executing the object in order to understand if it’s malware or not. I know that doesn’t sound brilliant. But, it’s a very effective way to detect malware nowadays.

F-Secure is a 30+ years old cyber security company that is developing several solutions to protect our customers. Among them we have the more traditional EPP endpoint protection solution (also known as Antivirus). Then we arrange throughout IOT security devices and such, ending more on to the enterprise level solution consisting of EDR endpoint detection and response. And since (more or less) from the year 2005, F-Secure has been heavily relying on cloud platforms and technologies, in order to deliver protection to its end customers.

Every day, our backends ingest a huge amount of data, which consists usually in samples files or URLs – which we need to quickly categorize, in order to come up with a verdict: is it malicious or is it safe. And more generic metadata – mostly consisting of events. So, process are created and terminated, files are created, memory or memory injection DLL is loaded in memory and such.

In our backend we heavily rely, especially in the past – on RabbitMQ to provide messaging as a messaging system. We use it mostly as a task distribution mechanism and also as a tool (basically as a pipeline) to convey our events to these different data enrichment tools and feature extractions that eventually terminate on different AI technologies such as machine learning models and expert systems.

We started adopting RabbitMQ, I believe – in 2010. The reasons why we chose such a technology among the available - were mainly two.

Reason number 1 - RabbitMQ is a centralized broker

Rabbit is a centralized broker and in our architecture this is very important. Because our architecture is mostly consisted of stateful architecture made of a lot of stateless microservices. And in such complex architecture, having a centralized communication system – which we can actually look at, really helps us understanding how our automation is doing. It helps a lot when we need to deal with problems and escalations. These problems and escalations might sometimes be the reason for self-inflicted pain of code error and configuration error. And many times they, as our backends, are heavily and tightly coupled with what happens out there on the internet. It might be due to some phenomena out there on the internet - such as an aggressive malware spreading or a particularly big Windows update. I sometimes fail to see the difference between a malware escalation and a Windows update, but that is a personal opinion.

And, whenever these kind of things happen, we quickly learn that – one of the first system to show symptoms was RabbitMQ. What we would observe is a huge spike in the size of our queues: that the size would literary sky rock to one more queues. And this really helped us becoming more responsive towards problems. At also identifying what was the suffering and which were the suffering points within our architecture.

At a certain point we grew so sensitive to RabbitMQ queues that, one of the best ways to actually trigger a panic attack to one of our colleagues simply consisted on walking to his workstation, to his workplace and you would walk there and say like: "Hey, do you remember that service you once developed? Yeah, that one. There are 2 million entries in a RabbitMQ queue… Do you think that everything is alright?" So, that was the best way to actually ruin someone’s day.

The bottom line is that sometimes, somewhere, Rabbit queues matter in people’s lives.

Reason number 2: RabbitMQ offers a plugin based infrastructure

Another reason why we chose RabbitMQ was due to the amount of features that the broker already back then supported. This is thanks to its plugin based infrastructure or architecture.

RabbitMQ plugins, by Matteo Cafasso

In this slide I try to condensate a few of those plugins which I managed to find on the internet. Some of them come from the RabbitMQ website, while others were collected from Github or Gitlab. There might be some deprecated or unmaintained plugin there. This is just to give an idea of the amount of features that the broker supports through plugins. And I have highlighted in red those which are known as core or tier 1 plugins… these are the plugins that get maintained and developed by the core team and get shipped all together with a broker once installed.

How to install RabbitMQ plugins

How to install a RabbitMQ plugin is pretty simple. They come as EZ files, they are Erlang archive files. We just need to grab it and drop it in the default plugins folder which varies across installations so you can check the documentation.

And then you can use the Rabbit plugin tool in order to enable / disable them and interact with them.

Demonstration: The RabbitMQ management plugin

This is one example of a Rabbit plugin and the most known one is the management plugin. It simply consists on a web server running alongside with a broker and interacting/interfacing with it.

It gathers metrics so that the user can actually monitor the state and the status of the broker; we can see how many nodes are composing the cluster; how are the queues and exchanges doing; how many messages are coming in and out. And we can also use it for interacting with a broker – so we can create objects, publish messages and so on. It comes also with a rest API – which are visible back here.

This can be used for interacting with a broker in a programmatic way. You can build your own scripts and use them to interface with a broker. Now, it's undeniable that all these plugins was one of the main selling points back when we were choosing which broker technology to use.

In order to understand this to the full extent we need to understand what we were doing before in F-Secure. We were using an in-house built system called Lamb (yet another animal). It was a peer to peer message distribution system - and it was a very fairly good distribution system.

To tell you how good it was. Back in 2012 the main motto at the time was - we needed to kill the Lamb. And not because we are brutes, but because RabbitMQ was taking over. As we speak, it’s almost 2019… and I know that there are pretty many Lamb instances still running wildly across our backends. So, this is how difficult it is sometimes, to get rid of things which work when you’re dealing with live backends and live architectures.

One of the reasons why this was so wow for us when we first brought it up is like, we can really see what is happening – is that Lamb did not provide any UI back then. The reason is pretty simple. It was maintained by our few people that were not any free resources to build a UI for interacting with a messaging system. This is one of the reasons why I believe RabbitMQ has been so successful.

I am personally a fan of plugin based architecture because I find it is very effective way to deal especially with open source projects. What we usually observe and what we learned as open source developers is that if our technology gets popular, all of a sudden we get overwhelmed by a huge amount of feature requests.

And the more traditional way to actually tackle this problem is picking every feature and building it in our technology. This usually leads to the size of our core team growing. But, this only scales up to a certain point, because at some point the human overhead becomes very difficult to manage.

Another strategy that we can do is providing well defined and well documented interfaces and let the community of users actually use these interfaces – to build their value on top of our solutions. And, all of a sudden users become contributors. And I find this very interesting because it enables the core team to remain small and agile and focus on what really matters: security, code quality, performance, scalability and documentation. These are all the things which a core team should really tackle. And this is another reasons why I think RabbitMQ has achieved such a longevity as an open source project and why it is so successful.

Developing a RabbitMQ plugin

What I would like to do with you today is developing a RabbitMQ plugin. It’s going to be very simple; it’s going to be a Hello World plugin.

But, before we get started we should understand when it makes sense to develop a Rabbit plugin, right? What usually happens when we adopt a technology within architecture is that hopefully this technology starts producing value from the get-go; and the more we use it within our systems the more value it produces. Until we come up with a use case where this technology is not a good fit anymore. So, what we can do then is choose another technology and run it alongside with the current technology we are using. Or, we try to solve the problem by developing.

A common mistake that we usually do as developers is that when we come up with something – we build it on top of the solution, rather than building it with or within the solution. In an open source world I think this is a very, very bad mistake. So, what we did in F-Secure back then, as I said before... we were using RabbitMQ as a task distribution service and we immediately realized we need priority supports over our tasks. Back then, this was a feature that RabbitMQ did not provide.

What we did was, to build our prioritization protocol. It consisted of taking each and every queue – and splitting in multiple queues. Then each of these sub-queues would consist in a priority level. And then, we would build libraries which – on the publisher and the consumer side, would be aware of this specific topology and they would use it.

Basically we were taking a responsibility (a feature) and spreading it all across our architecture. And of course, as you might guess – this became a maintenance nightmare on a long run because each and every node was supposed to play the game within the rules; otherwise, things would break very badly.

A much cheaper and a much affordable solution would have been to actually build this feature within the broker; first step as a plugin and maybe pushing it to become a core feature. And luckily, someone else realized the same so, later on a RabbitMQ plugin implementing priority queues came up and now, this part is a core feature of the broker.

RabbitMQ plugins are developed in Erlang

You might have heard that a Rabbit is developed in Erlang. I wish I had the time to actually go through some of the Erlang design principles, but due to time reason – we can’t really get there. What I did instead was providing what I believe is a good description of what a Rabbit plugin is from a development perspective. And I put good care in placing in the slide – all those keywords which I think any developer who is not familiar with Erlang, should look at.

What a RabbitMQ plugin is, is an Erlang application consisting of one or more processes (supervised processes) which would interface with a broker either via message passing (and with messages here I mean Erlang messages, not RabbitMQ messages) or by implementing behaviors within modules and registering these modules within the broker, so that the broker could actually use our interfaces to extend its capabilities.

RabbitMQ plugins Requirements

Erlang
Elixir
Git
Make
Zip

First we need to set up our development environment. As you can see, the dependencies are few and the main most important one is of course – Make. So, Rabbit uses the "make" tool for building its platform and more precisely, they recently (I believe it was Version 3.5) moved to these make tools called Erlang.mk, that is a very powerful build tool for Erlang environments.

What we will start with of course, is our Make file – which we place in the root folder.

As you can see, it’s pretty simple. We just define the project name and few dependencies. Ignore the gibberish in the middle, just copy – paste that. Trust me. I am an engineer and I know what I am doing. So, just copy that, it will work. And at the end, we need to include to external make files – where the actual heavy lifting part of the building happens, the RabbitMQ components make file and Erlang.mk make file. You can find them in the projects (in the links). Just copy them alongside with your "make" file and you are pretty much done. You start with a project with three make files… and that’s it.

We will first run make and this will ofcourse aggregate and fetch the dependencies and build them for us. This is the first step that you always need to do when we start with Rabbit plugins. Then, once that is done, what we can do is run the command make run-broker and it will actually compile our code and fire up a broker for us. And if we provide any plugins, these plugins will be actually started alongside with the broker. So, we can really start playing with a code - interacting with a broker, and to see what goes wrong and what doesn’t.

I guess you know what make tests does. Lastly, when we are confident about our piece of art and we want to distribute it to the people... what we can do is make dist and it will provide one of these .ez-files I mentioned before. And you could just give that to your end-users.

Elixir mix.exs

Elixir mix, by Matteo Cafasso

There are a few more steps we need to take if we are building our plugins in a different language called Elixir. Elixir is a fairly new and fairly modern language – fully compatible with Erlang, because it builds a virtual machine on top of the Erlang. At the time I wanted to give it a try and this is one of the main reasons I’m giving this talk, because the main challenge I faced back then was how to make the build tools to cooperate together.

Elixir - like many modern languages provides good things from the get-go. Amongst them, they provide our own build tool, so that you don’t need to Google: which build shall I use or how do I run tests. Everything is done with a tool called mix. When we use mix, the first thing we usually do is providing a mix.exs file, which is just a script describing our application.

So, my challenge back then was, how do I make these two things – the erlang.mk and the mix – work together. It turns out it is pretty simple once you know what to do. As always, it took me quite a few times. That is just a part of the mix script. I didn’t include it at all.

All we need to do… all the magic happens in the deps() function in there. We just need to instruct "mix" to ignore all those dependencies which the erlang.mk takes care of like for example RabbitMQ. We do that by passing two keywords:

The path where we specify where we can find the dependencies. Usually they end up under the deps folder.
And then the compile keyword that includes a command which will be issued in order to build the source code. Here we just need to pass a "noop" there. I am using the colon which typically stands for "no op" in bash, but any command which would not produce any outcome works as well.

Make file with Mix rule

Make file with Mix rule, by Matteo Cafasso

Next step, we need to slightly modify our "make" file. The only thing we change is in the middle. We have an app rule and in there, we call the "mix command". $(MIX) deps.get will actually fetch all the dependencies while ignoring the RabbitMQ dependencies. And then the deps.compile will do the same. It compiles them skipping RabbitMQ dependencies and actually compiles our application.

With Elixir we are now done with the bureaucracy. We can jump into real code. But then, the next question is like, how do we actually glue our code with the RabbitMQ. As I said before, usually a plugin is an Erlang application. So, we need to find a way to start our application, altogether with the rest of the RabbitMQ components. It turns out that that is pretty well documented on how to do that. I’ll try to show you an image.

It turns out Erlang boot strap is a real deal. It’s quite complicated as you might see. Don’t get overwhelmed by this picture. I couldn’t fit it in the slide either. The idea is you read this from top down and you can consider what happens as a timeline, or also as a dependency graph. RabbitMQ splits each and every boot procedure in a boot step, which can be seen as a node in a graph. Each boot step brings up that specific resource you are in need of.

Here you can see where all the exchanges are brought up. Usually a boost step requires a parent node, which is what should be run right before my boot step and optionally enable a node. This way, we can actually ensure that our plugin has the time to start up, before the rest of the components come up and RabbitMQ starts serving requests. Because the RabbitMQ boot procedure is synchronous and concurrent.

The RabbitMQ plugin: Hello World

RabbitMQ plugin Hello World, by Matteo Cafasso

So, this is all we need to do code wise, this is an Elixir code base – but in Erlang it would be pretty similar. All we need to do is to register a module attribute called @rabbit_boot_step. This way we inform the broker that this actual module can include instructions on how to fire up our application.

The Rabbit boot step describes what needs to be done. We need to include the module. In this case its the Elixir module itself and then a keyword list with a description of mfa- module function argument. This is the entry point. This is where we specify how to fire up our application. In here we start our processes, we bring up any database schema we need and we connect with anything we need. Last mandatory keyword is the requires keyword, where we actually place ourselves in the dependency tree (which I showed before).

We can optionally also pass an "enables key" so that we can make sure that our plugin (our "boots" procedure) is a blocking procedure. So that the broker does not proceed until we are not done.

As you might guess, we just print "Hello world" in this case, just to keep it simple. Let me show you this very quickly. What I will do is stoping the Rabbit broker that I was running beforehand. And, as I said before all we need to run is make run broker.

Hopefully what happens is that everything is compiling and at the end we see our little bunny ASCII code in here, meaning that the broker is up and running and our "Hello world" is successfully printed. So, we are basically done. Our plugin is build. At the end we can see that the bootstrap is completed with one plugins, basically telling us that our plugin was successfully loaded within the broker.

Expanding on our RabbitMQ plugin - Where to go next

Unfortunately, they are not much more to read in terms of documentation. The source is your friend. Out there, there are plenty of RabbitMQ plugins that you can look at, in order to better understand how to do things.

You can either copy - paste them, or try to read and understand them. One repository which will become pretty relevant already from the beginning is the rabbitmq-common repository. That is where all the common data structures and behaviors or interfaces are actually provided.

Amongst the behavior, some honorable mentions:

The rabbit_exchange_type: that is the behavior you would implement, if you want to provide your own exchange.
rabbit_backing queue. Again, this is the behavior you would implement in case you want to provide your own queue customization. On top of that, now a bit of a disclaimer - RabbitMQ queues is where most of the real deal happens. So, don’t look at these first, as long as you don’t want to end up in tears. This is a pretty hard behavior to look at and I am still learning a lot on from it.
And the last rabbit_authz_backend is a behavior that you would implement if you want to add a different authentication and authorization mechanisms.

RabbitMQ plugins - Case study and Conclusions

So, to conclude this talk. I would like to share with you a personal experience when it comes to RabbitMQ plugins. What I was trying to tackle back then was a common problem, especially within our backend, consisted of duplicated messages. And with duplicated messages, I don’t mean those messages which are duplicated within the broker, due to network errors or any sort of internal error. But, rather those messages which from a brokers perspective are unique. They come with their unique identifiers and timestamps, but from an application or an architectural point of view, they are not.

Usually, to each message corresponds a side effect. It could consist in a little bit of computation or, it could be a more critical operation such as - sending an email, or changing some value within a database. And of course, when we see duplicates there will be duplicate side effects. Meaning there might be some very big implication within our architecture.

To give a concrete example. Let’s think about an e-commerce website where the UI – for some reason, freezes or gets sluggish, and our customers nervously are clicking on the buy button multiple times. From the broker perspective these are all different unique messages. But, from our business case, we probably don’t want to end up charging our customer multiple times and shipping multiple identical products.

The way this is usually tackled is that on the consumer side, we add a little bit of logic, usually relying on a local or distributed cache. So, anytime we receive a message on the consumer we check the local cache and we decide: do we actually process this message or not.

Now, this is not inherently a wrong approach but again, as I said before - what we end up doing is grabbing a responsibility feature and scatter it across the whole architecture. So, this is a difficult thing to get right. There are many, many corner cases to take care of and also, raise conditions - which usually are very tricky because they show up later on in your architecture and you end up wasting a lot of time investigating.

What I asked myself back then was, as the broker is actually the central point (where all the messages are passing through) why not just implement a similar mechanism? Drop it inside an Erlang process running alongside with a RabbitMQ broker. Then using the behavior and subscribe to the events which happened within a broker. So that I get called every time a new message either, gets on my exchange or, in the queue. And then I check, should this message actually move forward or not?

RabbitMQ message deduplication plugin

And this is what I did. I built a deduplication plugin. I think this is one of the first functional Elixir implementation that you can find out there. It’s pretty simple. All you need is to provide an "x-deduplication-header’ and its content will be checked when you publish it within the broker, in a deduplication exchange or a queue. And according to your rulesa decision will be made: should these messages continue its route or not. We can deduplicate either on the exchange or on the queue. We deduplicate on the exchange if we want to drop our messages early on and we want to drop our messages in our temporal fashion. In other way, we want to say that once you see this message again in the next 5 seconds, minutes or hours – just drop it; I want to see this message only once within this time window.

And if we deduplicates on the queue in a little bit different manner – so, once the header comes in, we check is any other message having the same header in the queue? If so, we don’t publish it within the queue.

So, let me demo these for you. I will use the already running Rabbit broker. I don’t need to do very much more on top of that. What I am about to do here is just running a script which installs the management plugin and a deed application plugin and then bring up some resources, so that we can actually start our demo.

Now, if we go back to the management plugin, we might need to refresh it because we killed the previous broker and once we look at the exchanges we can see that there is a new deduplication-exchange of type x-message-deduplication.

So, when we create an exchange, one of the mandatory argument you need to give is the cache size because, of course we cannot deduplicate everything forever. And then we can provide an optional cache TTL. This is how we specify our temporal windows. It’s optional, so it can be virtually forever, or as long as the cache doesn’t rotate our headers. And in this case, it is a millisecond. So, what we are doing is using a temporal window of 10 seconds. We bound this actual exchange to a test queue which should be empty as we can see.

We will put a value test and what we do is publishing this first message which should go through and then, we quickly publish a second one and we should be informed that the message didn’t go through.

So, the first message gets successfully published and if we get quickly enough - the second message got actually published, but did not move through. If we inspect the queue, we will actually see that we’ve got only 1 message in our queue so, the demo was successful.

As I said, it’s a 10 second TTL so hopefully, 10 seconds already passed. So, let’s try again. We put again our test and now this message should go through because actually its TTL should have expired. So, if we try to publish the message again, our message now went through. This is how we can say that I don’t want to see the same message for the next 10 seconds in my architecture. If we now look at the queue, we have 2 messages as shown.

For those who loves performance as I have a bit of time, let me show you something more. This is a Python script and what we will simply do - is hammer the broker with 10.000 messages all having the same deduplication messages. A different one than the one I showed on the broker on the management console. And as you can see, it’s pretty fast and if we look now at the exchanges, we should see it. Pretty good. We’ve had thousand messages going through in my small laptop which is pretty good.

Hopefully, as expected - our queue end up with only 3 messages. I am showing here only the deduplication level. As you might guess, the queues work pretty similarly, so it doesn’t really make sense to demo it.

Conclusions of building plugins for RabbitMQ

Getting started is easy
RabbitMQ plugins tend to be simple
Two supported languages: Erlang and Elixir
Excellent learning experience

We can move to the conclusions for this talk. I hope it was clear for you how to get started. It’s really easy once you know how to do it. This is why I wanted to give this talk because I find that - the first big barrier for a developer when trying to develop on top of the new technology is: how do I get started? All the bureaucracy like, how do I build, how do I find the resources and the rest.

I hope that has been clear. How simple it is to actually get up and running and start playing with the code which is usually the more interesting part.

If you look on the web, you will realize that most of the RabbitMQ plugins are pretty simple. They consist of very, very few lines of Erlang code wrapped within a process. And, what Erlang does via the supervision sort of behavior – is ensuring that your process will be restarted if it’s crashing. So, you don’t really need to fear at the beginning. If your plugin crashes or, has any problem it usually does not end up hindering the rest of the broker infrastructure.

I really like the fact that there are two supported languages: Erlang and Elixir. I think that this will actually extend the amount of developers which could potentially become contributors. Especially Elixir, it’s a more modern language that comes with really nice tutorials and documentation. It was built by people who were really inspired by Ruby and Python. So, if you are familiar with those languages, you should find a lot of analogies on top of the Elixir language. And it’s actually fully compatible with Erlang, so you can reuse everything that Erlang already did beforehand.

On the last point I would like to make one note. I think it was a very good learning experience for me but I would like to add one note regarding Erlang specifically. When I talk to people about developing RabbitMQ plugins for example, the usual answer is that: "Yeah, but… it’s built in Erlang, right?" Let me tell you that in the past I started quite few languages, because I like languages. I am a curious guy so I like to look at those things. And I need to say that Erlang has been the only language (or, the language that has taught me the most, beyond the syntax and the semantics itself.

Erlang is a language built by some very brilliant guys which were focusing on a very hard problem, distributed applications, scalable over very big and problematic networks. And while studying Erlang, what you realize is that in there you will find a lot of those architectural patterns which we usually end up using within our distributed architecture. They are there, in the code and you will learn even more patterns; even more corner cases to tackle.

So, I need to say that as a developer and as an architect: Erlang might be not the language you end up using on your daily life at work. I have never written a single line of Erlang at work. But, it’s still one of those languages that will bring you the most in your daily work - beyond the simple syntax.

That should be everything from my side. I’ll just include a few references.