What’s so hard about Event-Driven Programming?

I was lucky enough to attend the Software Architecture Workshop in Cortina recently. It was a three day workshop based around the idea of Open Spaces, which involves handing the asylum keys to the inmates and seeing what happens.

I convened a session called “What’s so hard about Event-Driven Programming?” to explore the experiences of the other delegates in designing, implementing and testing asynchronous, event- or message-driven systems. I took the position that actually it was all very straightforward as long as you followed a few basic principles. Another delegate, Mats Helander, took the opposing view that asynchronous, event-based systems could develop scary emergent behaviour that would lead to a world of hurt.

About eight of us batted this around for a while, meandering into emergent behaviour of dynamic systems, before bringing it back to a realistic enterprise example. Say you are writing a component to process a sales order. You need to calculate the order price, persist the order data and send a notification email. So in Java you might have a method that looks like this:

public void processOrder(Order order) {
  pricer.price(order);
  persister.persist(order);
  notifier.sendNotification(order);
}

You have services to price, persist and notify about the order and you call them synchronously one after the other.Let’s say that calculating the price of the order is time-consuming and happens remotely, persisting the trade is quite quick, and sending the mail (via a remote mail gateway) takes somewhere in the middle. That means that for a lot of the time in the processOrder method, the thread is hanging around waiting for stuff to happen.

As the system handles more and more concurrent orders, this thread-per-order model would create a lot of mostly-idle threads which would eventually cause the VM to implode.

Thinking concurrently

Instead you could have three queues, a PricerQueue, a PersisterQueue and a NotifierQueue. You could represent the order processing as a ProcessSalesOrder message, which would know that it had to start on the pricer queue, then get itself passed onto the persister queue and finally onto the notifier queue.

Each queue (which could just be a linked list in memory – it doesn’t have to be very clever) would have a number of consumers, each in their own thread. The processOrder method puts your order onto the pricer queue. When the order gets to the front of the queue, it gets priced by the next available pricing consumer, and then handed on to the persister queue. Likewise, once it has been persisted it gets passed to the notifier queue.

So that’s our basic asynchronous model: a sequence of synchronous calls to services is replaced by passing an event or message through a series of queues or stages. (Of course multi-stage, event-driven systems are nothing new).

But what does that give you? You just replaced a nice, simple three line method with a bunch of queues, events, consumers and goodness knows what else. So what? Well the thing is, now you can get clever. You can monitor how big each queue gets, and change the resource allocation of consumers on the fly. The pricer queue is getting a bit full? Let’s add some more pricers. We can take a few threads away from the quiet persister queue – no-one will notice. As long as you are reasonably careful in defining what each queue does, you shouldn’t run into any issues with locking or race conditions. Most importantly, the application becomes massively scalable, and its behaviour will degrade gracefully under load. As this wake-up call makes abundantly clear1, we need to already be thinking about designing parallelism and concurrency into our applications, rather than hoping that the the next wave of hardware will make everything fast enough.

Pay no attention to the man behind the curtain

The most interesting part of the discussion for me was in trying to address Mats’s concerns about emergent behaviour and all the other weirdness that can happen when you just let a bunch of queues asynchronously get on with business. We were saved by another delegate, Lennart Ohlsson, who pointed out that 10 years ago we were having exactly the same conversations about object-oriented programming.

“This polymorphism is madness!”, “How can I know which version of play() will be invoked on my Musician variable2, when it could be pointing to a TrumpetPlayer or a Pianist?”. It turns out that when we ignored it and trusted the late-binding pixies, everything just fell into place. If you actually stop to think what’s actually happening when you dispatch a method call in an object-oriented system, you can give yourself a funny turn.

Well it’s exactly the same with parallelism. If you choose to ignore how it works and just leave it to the threading pixies, it all just works. This seemed to be exactly what Mats needed to hear, and we all left the session happy and enlightened.

As a post script to the session, the following day I pointed out to Mats that as a C# programmer, he was already used to ignoring asynchronous, event-driven behaviour in his everyday programming. He looked appropriately sceptical, until I pointed out that he used a garbage collector — or more precisely that he had one lurking around that just got on with business, asynchronously, in an event-driven way, reading discarded objects off a finalize queue and laying them quietly to rest.

1. Thanks to my colleague Ian Cartwright for the link

2. It’s the example I always remember from the Core C++ book

6 comments

  1. And what about rollback conditions? If persisting a message fails, I don’t want to send a notification – but what if the notificatio has already been sent? I assume you’d end up needing to fron this stuff with some kind of workflow or something…

  2. “And what about rollback conditions? If persisting a message fails, I don’t want to send a notification – but what if the notificatio has already been sent?”

    The impression I get from this article (I may be wrong) is that the message isn’t added to the notification queue until after the persisting queue is finished processing the message. So the three stages of order processing are still sequential, we’ve just added capability for more concurrent processing of other orders.

    However, I’m not sure I understand the benefits here. I’ll attempt to translate Dan’s example into a more concrete breakdown.

    We have three stages of order processing: Pricing, Persistence and Notification. Pricing is an expensive operation, requiring 5 time units. Persistence is cheap, only requiring 1 time unit. Notification is moderate, requiring 3 time units. So, all in all, processing an order requires 9 time units.

    Now, imagine we have ten workers and fifty orders come in concurrently. Ten can be processed at a time, and the total time to process the fifty orders is 45 (9 time units / round * 5 rounds) time units.

    Alternatively, consider those workers instead handle the three queues mentioned by Dan. Let’s divvie them as 5 Pricers, 1 Persister and 3 Notifier. The tenth worker is accepting the requests and placing the initial message on the Pricer queue.

    When the fifty concurrent requests arrive, they are all placed on the Pricer queue immediately (not quite, but it’s safe to assume the dispatcher worker can fill the queue faster than the pricers can deplete it). Five are taken off the Pricer queue and 5 time units later added to the Persister queue. At this point, five more are taken off and processed. In the ensuing five time units, the persister worker has handled all five requests (1 time unit per request) and passed them on to the Notifier queue. So we see the one persister keeping up with the five pricers. Five more requests are placed on the persister queue and five more taken off the pricer queue. Also, the persister has been placing requests on the notification queue at the rate of one every time unit. The three notifier workers are pulling these off in turn — by the time the persister has put a fourth request on the notification queue, the first notifier has finished the first request and can grab that one. So the notifiers are also keeping up. Every five time units, the pricers consume five requests of the pricer queue and the others are managing the previous requests apace. After 50 time units all 10 rounds of 5 requests have been processed by the pricers. 5 time units later, the persister queue has finished its processing. 3 time units later the last notifier finishes its requeust. Total time: 58 time units.

    Isn’t 58 > 45?

    Jacob Fugal

  3. I read some academic papers on an alternative queuing approach. Instead of a thread per queue it used one thread on one processor. Instead of trying to run StepA, StepB and StepC from multiple requests at the same time, it ran a million StepA and queued the results, then a million StepB, then a million StepC. Giving a CPU a small chunk of code to run many times turns out to be more efficient than having it run a lot of different code and switch contexts all the time. Experimental implementations in web servers ran as quickly as normal threads and degraded more gracefully. There are many ways to carve up a problem!

  4. I think it would be interesting to be able write the code in synchronous fashion and then transform it into an asynchronous version, using annotations to mark the asynchronous operations.

    For instance suppose I marked Pricer.price(Order) as an asynchronous operation then when loading OrderProcessor the AsynchronousTransformingClassLoader would notice that processOrder calls an asynchronous method and transform processOrder appropriately using futures and addressing concerns that Sam raised about errors occuring early in the pipeline.

    Event driven architectures have quite a simple structure and therefore the number of types of transformation would be quite small.

    I’d like to end this with an observation: When you look at the original processOrder and add the new piece of information that Pricer.price(Order) is slow you do not add much information to the description. I think the amount of change you need to make to a program to accomodate this new change should be of comparable size to the size of the change, and if this is not so then something is wrong with the model. In this case the model lacks the ability to concisely express asynchronousness.

  5. Tom Appleton · ·

    Hmmm, interesting. I’m currently working on an event driven project and the pixes that take care of the queueing are certainly present. This is because we are using a specific language (yep neither java nor ruby :-) ) that manages the queueing for you. What is puzzling at the moment is how to test this system as it is asynchronous, concurrent and has lots and lots of input streams. The question of emergent behaviour within the business logic that is not covered in any tests is difficult to address at the moment. Maybe that is a separate topic – testing event driven systems.

  6. […] What’s so hard about Event-Driven Programming? « DanNorth.net – February 22nd ( tags: events eventdriven programming ) February 22nd, 2012, @ 2:00 pm | Tags: links | Category: delicious links | Comments are closed | Trackback this Post | 0 views […]

Follow

Get every new post delivered to your Inbox.

Join 486 other followers

%d bloggers like this: