In this talk from RabbitMQ Summit 2019 we listen to Alex Thomas from Oliver Wyman.

Almost every publish-subscribe technology embraces a model where messages appear on correspondingly distinct topics. Unfortunately this leads to loss in order of events across the topics. Alex will show how RabbitMQ offer a way to implement the equivalent of topics without losing track of the overall order of messages by using the "source-oriented exchanges" pattern.

Short biography
Alex Thomas ( GitHub ) is an industry veteran with 30 years experience in IT project development, delivery and problem-solving. As a tech lead at Oliver Wyman Digital, he focuses on system integration, data analytics and service and infrastructure design. Recent projects include developing a core banking system for a UK challenger bank and leading integration systems development at a European energy provider.

Using the source-oriented exchanges pattern to keep events in order

My name is Alex Thomas, I’m from Oliver Wyman. A bit of history. The group that I'm with was originally LShift. And then, we became part of this management consultancy. It was LShift behind the original RabbitMQ work some years ago now.

Talking about going back in time, publish-subscribe pattern has been around for a long time. I think Wikipedia says this all was started in 1987 or something, but you can trace it a fair bit back beyond that point.

I think I come in about here, the Stratus machine. That counts as a mini computer in those days. That had a very nice messaging publish-subscribe implementation, a sort of fully-fault, hardware-fault tolerant, transactional tie integrated with the file system. Quite a neat implementation that survived in some niche applications for many, many years.

Channels

So, during all this time, a lot of advice has come out about how to do publish-subscribe rights. I'm just going to review it and we'll evaluate it and see what you think.

So, one of the architecture books recognizes that we've got publish-subscribe channels and we’ve got different types of information down those different channels. Pretty straightforward.

Classes

The Wikipedia entry, the one that was a bit dodgy on the date, suggests that they’re characterized into types/classes, whatever your message is. It's down to the subscriber to identify the types that they want to subscribe to.

Topics

JMS, quite well-known topic-oriented mechanism there. You define content-oriented hierarchy and, obviously, you subscribe to a subset of that hierarchy in your consumers, but each element of that hierarchy, each leaf node is itself a channel. And this is how things like Tibco, and MQ series, and so on, have been working for many, many years.

Common Practice

So, again, I think we're getting some sort of fairly consistent message from these guys over the years that they want your message streams to be chopped up into something that is either just the wrong message type. If you add an order and then an order cancellation, then you might regard those as two different message types and put them down to queues. Or you might say, “Well, no. We're going to have it more subject-oriented. These events clearly refer to the same logical entities that we're going to put that down the same stream.” So, one of those two models is going to be what you end up doing.

Book flight

So, typical subject area, we've got some source of activity here in a customer-facing portal. And then, the downstream systems are somehow tuned in to the events that the customer is generating, so they may set some preferences up and then perform a transaction. And, I guess, if we were following the earlier advice, we might have some topic or subject-oriented streams here with the streaming of events that relate to customer preferences and customer activities, CRM stuff and so on, and another stream that relates to purchase transactions. That sort of activity. So, quite straightforward implementation, you would think.

Outcomes

And the outcome, hopefully, these three events come through. They're picked up by the downstream systems. Maybe the preferences go to the CRM system. And then, they’re picked up from there by the final booking system, and the meal allocation is made, and the seat reserved.

But, looking back, there's nothing there that actually guarantees that at all. These two are separate streams. The order of events coming down those streams could be processed by the downstream systems completely. What would you call it? Without regard to their original order, back in the portal. You could, if you're lucky, process the preferences first. Or you might process the reservation and then apply the preferences. In which case, you'd be on the unhappy path and a disappointed customer. This is just down to their sort of the pushing of the structure towards topics as channels.

Solution 1

So, some proposed solutions to this sort of structure. On your consumer side, make your processes tolerant of duplicate messages. Make them able to receive the same message twice, Microsoft suggests.

Anybody? Any thoughts on that suggestion? Would that help? Is that going to keep my messages in order?

Participant: No. To provide guarantees of behavior that’s received, there’s no guarantee that it actually is.

No. I mean, there's no linkage there at all. If you knew that you were missing a Preferences, and you re-requested them, and you were tolerant to re-receiving them. Then, I guess, you would have some sort of solution there. But, simply saying that each message, in isolation, can be processed idempotently. That doesn't really help. You're still processing stuff out of order.

Solution 2

This is something we've seen at several customer sites. They might have an original activity like a financial trade that kicks off a whole load of events. And then, downstream, the events are published over different channels and there's a resequencer that pools together the events that have come down these channels and waits for them all to appear. And then, assembles the final update to that downstream system.

Any problems with the resequencer solution? There are. How long are you going to wait? Have I got everything? Was there another event associated with this original activity? What happens if some events are optional? It really only works when you're absolutely certain what set of events you're going to get correlated with one another.

Solution 3

The solution we're talking about here, is going back to the starting point - the topic, whether it was customer preferences or flight purchases, and so on, should not have been regarded as a communication stream as a channel. You should have the communication stream associated with the source and not with the subject matter so just to illustrate that. Now we've got, in RabbitMQ terms, a single exchange that is corresponding to the producer, the portal. That exchange is then routing the messages to the queues that correspond to these other applications. It fits quite neatly into the RabbitMQ model as a way of working.

Any suggestions as to how you might arrange the routing here? What sort of exchange are we talking about? Direct, right? So, you probably want a direct exchange so that you can use a routing key. We've sometimes seen headers exchanges used where you want to tag a message with multiple tags, but most of the time a single routing key is enough.

So, what's the secret here? What's the RabbitMQ ingredient that allows, say, the CRM system to tune into both the preference messages and the flight purchase messages? Let's say those messages have particular distinguishing routing keys.

Content-oriented / Source-oriented

So, you can do a binding. What's the nifty thing? You can do multiple bindings, so one-to-one queues. So, RabbitMQ provides quite a neat way of allowing something like the CRM system to tune in to whatever set of events or topics it wants to.

So, there's the pattern on the left which you're kind of encouraged to do in the general publish-subscribe literature, figure out what topics you’ve got, divide them all up, make the publishers publish to the right topic, and then the subscribers choose which topics they want to receive.

Here's the more RabbitMQ-oriented way of doing it. That also gives you the messages in the original order.

Sequence Alignment

So, to formalize it slightly, we're saying that all the messages and all the queues are a subset of a kind of a total order as seen from the producer side when the portal put the messages in the exchange. So, the exchange gives you this guarantee that, “Yeah, whatever I'm doing, although that queue is only interested in a subset of the messages that I'm sending, I'm always going to put that subset in the-- retain the original order for that subset.” So, that’s a really clever and useful feature of RabbitMQ for us.

Summary

So, there's the summary.

Portal. I think you see this pattern a lot. You've got a customer-oriented system that is the main source of upstream activity for a lot of these downstream systems. So, it's not like producers and consumers are symmetrical. You generally have many more consumers in a sort of micro service type setup than you do have producers.

So that is all I was going to run through. But I did notice that the WeWork talk, which was around the details of preserving order but also allowing concurrency, was talking about the sharding approach. I don't know, most of you in that talk? So, that works within the individual streams, if you like.

If you combine sharding according to the number of consumers that you want to run in parallel with this approach of having one exchange per source, then you can have concurrency going on that will still preserve the original order as far as it matters because you're sharding according to some magic key like an account, or a customer, or something about where you can be sure that the other activity going on concurrently is not going to interfere with it. So, the obvious thing to do would be to combine these two approaches.

And that's us. There's this other blog including quite relevant RabbitMQ ones for monitoring and other applications. There's another blog on some monitoring.

Metrics, we've made a small change to RabbitMQ a couple of years ago to put some dwell time tracking on there. I think that would’ve been useful in the last talk from Zalando guys, so do check that out.

Applause