Based on the response I received to my earlier post, it was clear that I mostly accomplished peaking the interest of many readers but not really clearly articulating the concepts. So, I'm going to try to do a better job of elaborating.
Let's start with some axioms:
The only way out is through. I'm relatively sure that Robert Frost was not thinking about database transactions when he wrote that. Regardless, it is the best way to sum up the first axiom. As soon as you bring up the topic of eliminating distributed transactions, compensating transactions find their way to the front of the conversation. And that is definitely one technique for coping with the lack of 2PC. But, they bring their own complexities to the table. The goal should be to design your data model and business logic in such a way that the only direction for a transaction to complete is forward.
A call to order is in order. Ordering operations is critical, with or without distributed transactions. In the 2PC world, consistent ordering is critical to reducing deadlocks. Without 2PC, proper ordering is important to minimizing unrecoverable database corruption. In fact, it leads to a related axiom:
It's better to be an orphan than an empty nester. In any parent/child table relationship, it's better to have orphaned child rows than parent rows without the full compliment of children. Why? Well, orphans consume storage but are otherwise harmless. Empty nest rows are incompletely mapped entities. More logic is required to manage scenarios where you have parents but missing children than the other way around. This approach doesn't always apply. For example, if the child record can be easily recreated from the parent, then it probably makes sense to write the parent first.
Better late than never. Most applications provide an expectation of immediacy to the clients. Even so, it is better to insure the logical operation will complete at some time, if not immediately. Yes, the lack of consistency in SLA can be frustrating, but no where near as much so as losing information completely.
Idempotent operations are your friend. This key concept is often overlooked in system design. Providing mechanisms to detect an attempt to apply the same operations multiple times is more complicated than ignoring it. But if operations are idempotent, then recovering from failed transactions is easier. A simple journal of actions can be tried again and again until you've determined they have succeeded.
Putting the Axioms to Work
Great. I've given you some axioms. I've even tried to make them into cute little memory aids. But how can you actually use them? I'll walk through a contrived example. Contrived is critical because I have established a set of requirements to illustrate the axioms. I've picked a solution to further reinforce them. That isn't to say that you can't apply this to any real world problem, only that you'll have a bit more work in store.
I'll use a relatively standard shopping cart to illustrate the example. There are a couple of feature requirements that potentially make the cart unique. First, the items that are placed in the cart are not reserved for the shopper. This means that they can disappear if another shopper completes the purchase. The shopper can also ask that personal information be saved for later use. I'll skip to the check out process because this is where the most interesting challenges will be. For the sake of demonstration, I'll assume there are three databases, one for items for sale, one for managing the cart and transactions, and one for users.
How would we tackle this with distributed transactions? We'd begin a transaction, decrement the quantity of each item in the cart, record the transaction the transaction table, save the user's preferences, and then commit. If anything fails we'd roll back the transaction. Quite simple from a logic perspective but requires all three databases be available and leaves us exposed to dangling transaction issues if any application server happens to improperly close the transaction.
Without transactions, we have a bit more work. First, we're going to have to devise a scheme for reserving and removing reservations from items. I know I said try to avoid compensating transactions but this is an example of where they are unavoidable. We'll create an item reservation table that has one row per reservation which indicates the item, the quantity, and a time the reservation will expire. The first step is to reserve all of the items. If any fails, then the reservations are removed from all items. But what if the application fails to remove the reservation? A reaper will have to run that breaks reservations that have passed their expiration time. Reserving quantities also involves comparing against both the item table and the reservation table to insure there is quantity on hand.
Now that you have quantity on hand, you want to record the transaction. In this example, a transaction is a record of the items purchased and the payment status. I won't go into the details of how payment is managed. As the transactions are captured on a single database, all of the parent and child records can be written within a single database transaction. When we've completed this, we now need to update the quantities on the item database and remove our reservation entries. Once again, a failure could occur after the transaction has been recorded but before we've cleared the reservation.
This failure can be handled by relying on a message queue associated with the transaction database. Remember that we stored information about each item purchased with the transaction. This allows a message to be generated that indicates a transaction has completed. The consumer can use the transaction information to verify the reservations were correctly processed on the item tables. If not, these can be fixed. Whether this is a separate process or integrated with the process that breaks expired reservations is a design choice that is beyond this simple example.
Finally, we want to update the user's preferences. Now it is possible that the application fails after the transaction but prior to updating the preferences. In this case, you may very well decide that you accept that failure and move on. The user will be frustrated by the failure to store his preferences but less frustrated than by having this block his transaction. But let's say you really want to insure these are captured. This is another opportunity to rely on a messaging solution. A message queue that acts as a journal for preference updates can be employed. Whenever a write cannot be completed to the preferences database, a message is queue. When the preference database is available again, messages are processed.
Have I skipped some failure scenarios and edge conditions? Absolutely. But the solutions in most cases will follow similar patterns to the example above.
Technorati Tags: architecture, asynchronous, engineering, events, java, messaging, performance, programming, scalability, software, to_read, toread