« Coupling in a RESTful Way | Main | How Not to Plan a Career in Software »

Thursday, December 28, 2006

Avoiding Two Phase Commit, Redux

Based on the response I received to my earlier post, it was clear that I mostly accomplished peaking the interest of many readers but not really clearly articulating the concepts. So, I'm going to try to do a better job of elaborating.

Let's start with some axioms:

The only way out is through. I'm relatively sure that Robert Frost was not thinking about database transactions when he wrote that. Regardless, it is the best way to sum up the first axiom. As soon as you bring up the topic of eliminating distributed transactions, compensating transactions find their way to the front of the conversation. And that is definitely one technique for coping with the lack of 2PC. But, they bring their own complexities to the table. The goal should be to design your data model and business logic in such a way that the only direction for a transaction to complete is forward.

A call to order is in order. Ordering operations is critical, with or without distributed transactions. In the 2PC world, consistent ordering is critical to reducing deadlocks. Without 2PC, proper ordering is important to minimizing unrecoverable database corruption. In fact, it leads to a related axiom:

It's better to be an orphan than an empty nester. In any parent/child table relationship, it's better to have orphaned child rows than parent rows without the full compliment of children. Why? Well, orphans consume storage but are otherwise harmless. Empty nest rows are incompletely mapped entities. More logic is required to manage scenarios where you have parents but missing children than the other way around. This approach doesn't always apply. For example, if the child record can be easily recreated from the parent, then it probably makes sense to write the parent first.

Better late than never. Most applications provide an expectation of immediacy to the clients. Even so, it is better to insure the logical operation will complete at some time, if not immediately. Yes, the lack of consistency in SLA can be frustrating, but no where near as much so as losing information completely.

Idempotent operations are your friend. This key concept is often overlooked in system design. Providing mechanisms to detect an attempt to apply the same operations multiple times is more complicated than ignoring it. But if operations are idempotent, then recovering from failed transactions is easier. A simple journal of actions can be tried again and again until you've determined they have succeeded.

Putting the Axioms to Work

Great. I've given you some axioms. I've even tried to make them into cute little memory aids. But how can you actually use them? I'll walk through a contrived example. Contrived is critical because I have established a set of requirements to illustrate the axioms. I've picked a solution to further reinforce them. That isn't to say that you can't apply this to any real world problem, only that you'll have a bit more work in store.

I'll use a relatively standard shopping cart to illustrate the example. There are a couple of feature requirements that potentially make the cart unique. First, the items that are placed in the cart are not reserved for the shopper. This means that they can disappear if another shopper completes the purchase. The shopper can also ask that personal information be saved for later use. I'll skip to the check out process because this is where the most interesting challenges will be. For the sake of demonstration, I'll assume there are three databases, one for items for sale, one for managing the cart and transactions, and one for users.

How would we tackle this with distributed transactions? We'd begin a transaction, decrement the quantity of each item in the cart, record the transaction the transaction table, save the user's preferences, and then commit. If anything fails we'd roll back the transaction. Quite simple from a logic perspective but requires all three databases be available and leaves us exposed to dangling transaction issues if any application server happens to improperly close the transaction.

Without transactions, we have a bit more work. First, we're going to have to devise a scheme for reserving and removing reservations from items. I know I said try to avoid compensating transactions but this is an example of where they are unavoidable. We'll create an item reservation table that has one row per reservation which indicates the item, the quantity, and a time the reservation will expire. The first step is to reserve all of the items. If any fails, then the reservations are removed from all items. But what if the application fails to remove the reservation? A reaper will have to run that breaks reservations that have passed their expiration time. Reserving quantities also involves comparing against both the item table and the reservation table to insure there is quantity on hand.

Now that you have quantity on hand, you want to record the transaction. In this example, a transaction is a record of the items purchased and the payment status. I won't go into the details of how payment is managed. As the transactions are captured on a single database, all of the parent and child records can be written within a single database transaction. When we've completed this, we now need to update the quantities on the item database and remove our reservation entries. Once again, a failure could occur after the transaction has been recorded but before we've cleared the reservation.

This failure can be handled by relying on a message queue associated with the transaction database. Remember that we stored information about each item purchased with the transaction. This allows a message to be generated that indicates a transaction has completed. The consumer can use the transaction information to verify the reservations were correctly processed on the item tables. If not, these can be fixed. Whether this is a separate process or integrated with the process that breaks expired reservations is a design choice that is beyond this simple example.

Finally, we want to update the user's preferences. Now it is possible that the application fails after the transaction but prior to updating the preferences. In this case, you may very well decide that you accept that failure and move on. The user will be frustrated by the failure to store his preferences but less frustrated than by having this block his transaction. But let's say you really want to insure these are captured. This is another opportunity to rely on a messaging solution. A message queue that acts as a journal for preference updates can be employed. Whenever a write cannot be completed to the preferences database, a message is queue. When the preference database is available again, messages are processed.

Have I skipped some failure scenarios and edge conditions? Absolutely. But the solutions in most cases will follow similar patterns to the example above.

Technorati Tags: , , , , , , , , , , ,

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2010650/7312575

Listed below are links to weblogs that reference Avoiding Two Phase Commit, Redux:

Comments

Dan: If you assume the existance of a reliable and fault-tolerant message queuing infrastructure, why doesn't your design simply inject transactions into the queue and have queue consumers pull requests off, handle the cross-database work and on failure handle rollback and requeueing of a message?

Oh, we've ended up with a flavor of 2PC again (queue message, commit transaction). I think saying "we're avoiding 2PC" on one hand and saying "yes, we'll use a reliable, fault-tolerant message queue" on the other is kind of cheating. Maybe I'm wrong?

What happens when we boil down the problem to only having three basic operations: read, write and move (rename)? These are your typical atomic I/O operations, right? What if we had "unstoppable transactions"--think about your typical journal. The system state is determined at read-time, using either first-in-wins (FIW) or last-in-wins (LIW)--or, some other arbitrary but unambiguous means. This uses more storage (as every transaction is stored) but avoids 2PC, assumes your persistent store IS your message queue (in essence), messages are either lost (!) but never requeued (a message lost is the same as a failure failing to be queued, anyhow), and assuming a sane journaling implementation should be O(1) to determine state at read-time in JIT fashion.

Hmm, I think this won't work because it assumes (relies upon?) having robust synchronous writes across the distributed systems. Oops. :-) But, it sure was fun to think about!

Thanks for the brain food.

The message queue can be made reliable via horizontal scaling on the queuing side. I can avoid 2PC on the dequeue by relying upon idempotent handling of the events. I refer to that as at least once delivery of the message.

You did hit upon something I didn't state but have employed. The message queue IS the persistent store. A table and a queue are not very different!

Dan, if I understand correctly, what you're saying is:

1) Identify atomic operations (primitives) on each of the distributed systems.

2) For bonus points, design the system such that those primitives are idempotent.

3) Design your higher-order transactions to only be a composite of those primitives.

4) Lather, rinse, repeat.

Wow, you summed up two articles in 3 bullets. Well done!

It's not clear to me how the reservations and item counts are reconciled. For example, if we have only one count left in the items database, and two in flight transactions both log 1 item reservations, how does this get resolved? Does it depend on who gets to the item count updating first, and then the loser removes it's previous work? Or is one of the reservations some how prevented when it's insert is attempted?

Sorry I didn't explain that very well. The idea is that first you retrieve the item count and all the current reservations. If count - sum(reservations) is positive, you're free to insert a reservation for your quantity.

Once you've done that, you have to immediately retrieve count and sum(reservation) again. If the result is now negative, the newest reservations have to be removed until the count is again positive. Ideally each instance only removes their reservation.

Thanks, that makes it clear.

So would it be fair to say, you have to design your system as a series of communicating state machines, where the actions undertaken on transitions are idempotent or reversible? That way rather than locking you can use an optimistic approach and deal with interference explicitly rather than requiring blocking to serialize for you (similar to the TimeWarp algorithm).

Yes, is fair to say.

TLA warning: Idempotent Orientated Architecture or IOA for scalable architecture. :)

Dossy: You've described the fundamental of a transaction in three bullet points. The only additional item I'd add is to identify which operations are non-essential in case of failure. In this case, the writing of the user preferences.

The scenario also describes a best effort attempt to accomplish the transaction in memory and if there's a failure, a reaper goes into effect and tries to reconcile. This is definitively a compensating transaction.

Is it possible to accomplish this same task without a reaper or a reservation table?

I think you meant to say "piquing the interest", not "peaking".

http://www.google.com/search?q=peak+pique+interest

Sorry I don't understand the final preferences message queue. Where is this placed? Is this on the same system as the transaction database, or on the reservations system?

As far as I can tell that this must be part of or the same as the message queue for the transaction recording.

So the process that is checking the completed transaction queue must be both checking the reservations have been removed and the preferences have been saved?

Or are there 2 queues?

Post a comment

If you have a TypeKey or TypePad account, please Sign In