« November 2006 | Main | January 2007 »

Sunday, December 31, 2006

How Not to Plan a Career in Software

The year is 1995. Fresh from helping design CDE Mail and contributing to the SGI desktop, I find myself in an interview. The interviewer asks one of the text book questions, "Where do you see yourself in 5 and 10 years?" Of course my answers is exactly what any senior engineer in 1995 would have offered.

"In 5 years I will be running a consulting company helping organizations ranging in size from small startups to the Fortune 500 discover ways to more effectively leverage the Internet to reduce costs and expand their market. In 10 years, I'll be a fellow at a one of the largest success stories of the Internet."

Of course I didn't give that answer. If I had that kind of foresight, I wouldn't even bother working. I'd pick the 10-100X stocks every year and watch it half the year from my beach front home and half in my mountain retreat. But I don't have any such clairvoyance so like most of us, I work. But I'm blessed in that I have a career doing something I truly enjoy and also pays well.

My career got off to a strange start. At one of my school's career fairs, the organizer incorrectly listed my degree as chemical engineering. I spent the day with interviewers who had been sent to tell me they weren't hiring chemical engineers. One of them, from a medium sized company in Sunnyvale, decided that they were interested in me as a computer scientist. An interview intended to politely dismiss me turned into my first job, in the Silicon Valley no less. So perhaps serendipity was meant to be my career plan from the start.

Career planning in software, I believe is unlike careers in many other disciplines. It has a lot more to do with the project opportunities than any goals you might set for yourself. Each of my career advancements came because I asked for opportunities that were realistically just a bit beyond my current skills. I've had the good fortune to have people with the willingness and authority to take a risk on me. Part of the willingness was due to demonstrated ability but part of it was also due to my initiative in asking for the opportunity.

Serendipity played a role in most of the opportunities as well. Sometimes I looked around as one project was finishing up and selected something I thought would be a challenge. At other times, opportunities were handed to me to tackle. A few times I sought challenges that would help round out my resume but those decisions were certainly infrequent.

There was one career decision I made several years ago though that is definitely in line with more traditional career planning. Every engineer reaches the point where they have to decide between a management or technical track. I thought I would go through management because the rate of ascent and overall pinnacle is better. I tried my hand at management and discovered I really wasn't very good at it. So that left me with one option. Drive up the technical path.

So am I advising you to ignore career planning? No, but hey, it's worked for me! Seriously though, with the turn of technology, it is impossible to even know what technology you'll have to know in 5 years and certainly not 10 years. If your intention is to stay on a technical path, then relying on serendipity to broaden your knowledge and skill isn't a terrible idea. It certainly leads to an interesting career if nothing else!

Technorati Tags: , , , , ,

Thursday, December 28, 2006

Avoiding Two Phase Commit, Redux

Based on the response I received to my earlier post, it was clear that I mostly accomplished peaking the interest of many readers but not really clearly articulating the concepts. So, I'm going to try to do a better job of elaborating.

Let's start with some axioms:

The only way out is through. I'm relatively sure that Robert Frost was not thinking about database transactions when he wrote that. Regardless, it is the best way to sum up the first axiom. As soon as you bring up the topic of eliminating distributed transactions, compensating transactions find their way to the front of the conversation. And that is definitely one technique for coping with the lack of 2PC. But, they bring their own complexities to the table. The goal should be to design your data model and business logic in such a way that the only direction for a transaction to complete is forward.

A call to order is in order. Ordering operations is critical, with or without distributed transactions. In the 2PC world, consistent ordering is critical to reducing deadlocks. Without 2PC, proper ordering is important to minimizing unrecoverable database corruption. In fact, it leads to a related axiom:

It's better to be an orphan than an empty nester. In any parent/child table relationship, it's better to have orphaned child rows than parent rows without the full compliment of children. Why? Well, orphans consume storage but are otherwise harmless. Empty nest rows are incompletely mapped entities. More logic is required to manage scenarios where you have parents but missing children than the other way around. This approach doesn't always apply. For example, if the child record can be easily recreated from the parent, then it probably makes sense to write the parent first.

Better late than never. Most applications provide an expectation of immediacy to the clients. Even so, it is better to insure the logical operation will complete at some time, if not immediately. Yes, the lack of consistency in SLA can be frustrating, but no where near as much so as losing information completely.

Idempotent operations are your friend. This key concept is often overlooked in system design. Providing mechanisms to detect an attempt to apply the same operations multiple times is more complicated than ignoring it. But if operations are idempotent, then recovering from failed transactions is easier. A simple journal of actions can be tried again and again until you've determined they have succeeded.

Putting the Axioms to Work

Great. I've given you some axioms. I've even tried to make them into cute little memory aids. But how can you actually use them? I'll walk through a contrived example. Contrived is critical because I have established a set of requirements to illustrate the axioms. I've picked a solution to further reinforce them. That isn't to say that you can't apply this to any real world problem, only that you'll have a bit more work in store.

I'll use a relatively standard shopping cart to illustrate the example. There are a couple of feature requirements that potentially make the cart unique. First, the items that are placed in the cart are not reserved for the shopper. This means that they can disappear if another shopper completes the purchase. The shopper can also ask that personal information be saved for later use. I'll skip to the check out process because this is where the most interesting challenges will be. For the sake of demonstration, I'll assume there are three databases, one for items for sale, one for managing the cart and transactions, and one for users.

How would we tackle this with distributed transactions? We'd begin a transaction, decrement the quantity of each item in the cart, record the transaction the transaction table, save the user's preferences, and then commit. If anything fails we'd roll back the transaction. Quite simple from a logic perspective but requires all three databases be available and leaves us exposed to dangling transaction issues if any application server happens to improperly close the transaction.

Without transactions, we have a bit more work. First, we're going to have to devise a scheme for reserving and removing reservations from items. I know I said try to avoid compensating transactions but this is an example of where they are unavoidable. We'll create an item reservation table that has one row per reservation which indicates the item, the quantity, and a time the reservation will expire. The first step is to reserve all of the items. If any fails, then the reservations are removed from all items. But what if the application fails to remove the reservation? A reaper will have to run that breaks reservations that have passed their expiration time. Reserving quantities also involves comparing against both the item table and the reservation table to insure there is quantity on hand.

Now that you have quantity on hand, you want to record the transaction. In this example, a transaction is a record of the items purchased and the payment status. I won't go into the details of how payment is managed. As the transactions are captured on a single database, all of the parent and child records can be written within a single database transaction. When we've completed this, we now need to update the quantities on the item database and remove our reservation entries. Once again, a failure could occur after the transaction has been recorded but before we've cleared the reservation.

This failure can be handled by relying on a message queue associated with the transaction database. Remember that we stored information about each item purchased with the transaction. This allows a message to be generated that indicates a transaction has completed. The consumer can use the transaction information to verify the reservations were correctly processed on the item tables. If not, these can be fixed. Whether this is a separate process or integrated with the process that breaks expired reservations is a design choice that is beyond this simple example.

Finally, we want to update the user's preferences. Now it is possible that the application fails after the transaction but prior to updating the preferences. In this case, you may very well decide that you accept that failure and move on. The user will be frustrated by the failure to store his preferences but less frustrated than by having this block his transaction. But let's say you really want to insure these are captured. This is another opportunity to rely on a messaging solution. A message queue that acts as a journal for preference updates can be employed. Whenever a write cannot be completed to the preferences database, a message is queue. When the preference database is available again, messages are processed.

Have I skipped some failure scenarios and edge conditions? Absolutely. But the solutions in most cases will follow similar patterns to the example above.

Technorati Tags: , , , , , , , , , , ,

Tuesday, December 19, 2006

Coupling in a RESTful Way

I knew Pete would respond. Actually we agree on far more than we disagree upon but I would like to elaborate on some points in his response. Pete writes:


As is well known, REST applications also constrain the operations that are available, whereas each and every web service ever written publishes a unique set of operations that the client must know and know how to orchestrate. In a perfect world (and I know it’s far from perfect), RESTful applications don’t even need to agree ahead of time on the resource representation, instead they negotiate which well-known media type will be exchanged via HTTP’s accept and Content-Type headers. Properly RESTful applications push all state out to the client, therefore reducing what the server needs to know about past interactions. REST has a cache constraint, as well, allowing the client or server to cache a response without the client application even being aware that a response was retrieved from cache. And, so long as transport level security is sufficient, a RESTful client does not need to know the security semantics of the service ahead of time either. Nor does it need to worry about breakage if GET, PUT, and DELETE are called over and over again; it knows these operations are idempotent. REST applications exchanging XML documents should not have to worry about the data types of the message’s elements and attributes. They shouldn’t care if elements they weren’t expecting suddenly appear in these messages either.


I'll concede that I didn't correctly interpret Pete's original statement around SOAP coupling. You're absolutely right that SOAP based services fail to offer any kind of uniformity for common operations. That said, I do believe there are a suite of problems where CRUD is insufficient, but that debate has probably addressed sufficiently.

There are some assertions though that I struggle with in the above paragraph. First, RESTful applications must agree on at least ONE well-known media type for their interaction to succeed. You can absolutely use content negotiation to state preferences from the client and allow the server to select its favorite. (For the record, I've designed SOAP interfaces that do the same, but I digress). But if there is no overlap between the client and server, the request is going to fail. Being realistic here, in most cases the client will be interested in a specific type of resource (message, video, product). And yes, I can be pretty relaxed if I'm looking for a message or video about the content type (although I doubt I'd accept a message for a video request). But as you move further into business resources (e.g. product) the number of representations will either narrow or become sufficiently diverse as to be impractical to support a broad set of representations. My point being as you move from common web resources, the degree of coupling between what the client wants and what the server can offer will increase.

I've never written a stateful web service and I'm disappointed to hear that anyone would suggest such lunacy! SOAP definitely kills caching, but SOAP's insistence on not supporting HTTP well kills all kinds of transport level optimizations. I think I stated my position on that pretty clearly in another article. I'll leave the security discussion to Gunnar who is far better equipped to talk to it than I.

I like the concept that REST operations need to be idempotent. This is a great property but must create some interesting challenges for REST developers. Idempotent updates can be particularly tricky when accurate tracking of state transitions is one of the business requirements. Providing idempotent operations is a concrete example of the point I was trying to make about system qualities. Stating whether your operation is or isn't idempotent is necessary and if you were to change in either direction, you could very well break clients that depended upon the other behavior.

Chris Stiles had an interesting comment which is implying I'm guilty of my own form of magic:

When people start talking on this vein several things are unclear:

- It’s unclear what the client will be actually expected to do with such an SLA.

- It’s unclear what form such specifications and how they will be flexible enough to express every conceivable form of contract/SLA .. in particular ..

- It seems that this problem calls for a mixture of type inference plus formal reasoning applied over large scale systems and minus a revolution in computing it’s unclear how any of this will be achieved.

I was definitely less than clear on my point. And I whole heartedly agree that expecting to find a formalized language for specifying these kinds of things that will then allow the client to augment it's behavior is highly unlikely. But you do need to state these up front, most likely in your documentation (you do document your public interactions, right?) My main point, previously made poorly, is public Internet interfaces will be used in ways that nobody can predict. If you aren't clear about what performance you're agreeing to, what availability you support, or how many transactions you'll accept, somebody will assume that whatever they empirically measure is the contract. And they'll build a client that depends upon it. State it. They may still do it but then they are relying upon an unsupported aspect of your contract.

I'd like to state my position again in the whole SOA vs REST debate. I'm neither pro or con either. They're both tools. They both have strengths and weaknesses. I respond to content that I feel offers opportunities for further discussion and elaboration. This isn't about winning a debate for either technology, it's about trying to illuminate what I feel are issues so they can get addressed. The goal is to make both better.

Technorati Tags: , , , , , , , , , , , , ,

You Call it Coupling, I Call It Reality

I was struck by a statement that Pete Lacey made in his interview on infoq. The point was on SOA and coupling. Specifically:

PL: The second reason is that web services are too tightly coupled. Each participant needs to know a great deal about each of the others: the service and operation names, the messages they exchange, the datatypes of the message elements, the security context, the messaging semantics, etc. Thusly are non-scalable (and again I don’t mean transactional scalability) systems designed. One could argue that WSDL and the WS-Policy family of specifications addresses all this, but that’s not what I mean. Whether the tool knows or the programmer knows, the point is that one side of the conversation possesses intimate knowledge of the other, and should either side change, everything breaks.

What Pete is calling coupling, I call interfaces. Short of some magic that nobody has yet to explain, software components that interact need to have a contract and both sides have to know the contract. The client may not know about every detail of the contact, but it does know about the parts it needs. This is true whether the contract is exposed in SOAP or REST. And yes Peter, REST also exposes contracts that the other side needs to know. Move your resource to a new URI and see what happens to the clients.

If a contract is not coupling between applications, then what is? Coupling is about tying implementations together. Coupling occurs when it is not possible to change the implementation of the component without impacting clients. I will concede that there is very little in WSDL/SOAP that help prevent coupling. It's also unfortunate that for all of the complexity that comes with WSDL, the designers didn't include things like explicit constraint specification, that would have lead to better interfaces in WSDL.

The simple facts though is that REST doesn't make this any better. Resources have formats (schemas if XML). They are located by URI's which is a contract for where they can be found. Change the resource format in an incompatible way and clients break. Move the resource to a new location and the clients break. Contracts are contracts, break them and you break the software.

The challenge of coupling is much larger than the explicit interfaces exposed. An interface should clearly abstract the implementation. Everyone gets that. But what aspects of the interface are you explicitly stating and what aspects are being implicitly assumed? Is the response time part of the contract? Is the maximum throughput? Perhaps you say no, but then how can you be sure that no application, at Internet scale, has placed a concrete dependency on these aspects of your interface? And if either aspect were to change in a negative way, you'd break those applications. SOAP doesn't address this, but neither does REST.

If you're designing a good component with good contracts, then you are specifying aspects beyond simply the call interaction. You are specifying performance, availability, transactional assurances, data retention, manageability, and extensibility to name a few. You have to clearly state what promises you're making in each of these areas. You also have to be clear that those things that aren't specified are to be considered not part of any contract. In some cases, you may even specify aspects of the component that may not be depended upon.

Does this sound complex and heavyweight? Well, yes, it is definitely more involved than specifying a simple format at a URI. But if the goal is to achieve Internet scale, then failing to explicitly define your contracts will lead to real coupling. Non-obvious interactions that make both sides of the contract dependent upon each other and almost impossible to change in any manner at all.

Pete and I agree on one point. Coupling is bad for the Internet.

Technorati Tags: , , , , , , , , , , , ,

Wednesday, December 13, 2006

It's a Tool, Not a Religion!

One of the great things about sitting in traffic is you get to ponder various things. Recently I was contemplating the ongoing debate about REST vs SOA. Last week I was involved in discussions on C++ vs Java. Last night I stumbled across some debates on Django vs Rails. It occurred to me that software engineers are unique within the technical world. Rather than rejoicing in a rich toolbox, we argue over its contents, hoping to discard all but the smallest possible set.

One of my hobbies is British cars. We have two in our garage which means that either I learn to work on them or go broke paying mechanics. So, I participate on discussion lists related to them. Occasionally somebody will suggest a unique application of a tool to address a difficult issue. If others have found alternate approaches, they'll share those ideas, usually involving a different tool.  I have never once, in more than a decade of participating in these discussions, seen an argument erupt over whether one tool or the other was the singular correct approach. Sure, if the tool was completely inappropriate and might actually result in damage to a component, there is a discussion. But as a whole, mechanics don't argue whether the open end or boxed end of a wrench is the single correct end and should be used to the exclusion of all other wrenches or sockets in existence.

I also have several friends running contracting companies. I have spent time around job sites. I can promise you that you'll find circular saws, chop saws, and table saws. Is there overlap in functionality and potential application of these saws? Absolutely. Why have all three then? Well, because each is particularly good for certain types of cutting chores and less optimal for others. Contractors, like mechanics, want a broad set of tools available so they can have the optimal tool for each task.

Why then do we, as software engineers, have to work so hard to reduce our toolbox to the ultimate tool? There is this tendency to look for the Swiss Army language with supporting framework that will be the optimal solution for every problem in the world. The simple reality is that such a language and framework can never be created. Why? Well, engineering is always about compromise and the compromises are intended to optimize the solution along certain paths.  If your problems lie along that path, then the tool will be perfect. If not, well, you have a sub-optimal tool.

I'm ecstatic to have a rich set of languages and frameworks available in my architectural toolbox. The shear breadth of the problems that we face everyday makes it important to have options available. Understanding these tools and there applicability to any given problem should be the job of architects.  There are plenty of hard problems that need attention. Spending energy trying to denigrate or eliminate viable tools is pointless.

Technorati Tags: , , , , , , , ,

Sunday, December 10, 2006

Call to Action, The Case for HTTP Headers with SOAP

The ill fated SOAPAction header has been killed. Laid to rest with very little fanfare and unfortunately with very little appreciation for what it might have represented. Of course it was doomed for many reasons from the start. It was poorly defined in the specification. It was optional. And it was outside of the SOAP message which was it's biggest sin in the eyes of the SOAP community. SOAP isn't about HTTP is the claim. SOAP should be kept completely independent of any transport properties, regardless of the implications. After all, your SOAP request may be forwarded across a myriad of transport mechanisms before it is finally processed and headers will get lost.

It's time for a dose of pragmatic reality. Yes, SOAP may be carried over transports other than HTTP. But I will assert that very few instances of messages will be carried over multiple transports during the life cycle of the message. Constraining potential optimizations to protect against this edge case is unnecessary. Designers should have the flexibility to decide between optimal transport usage or message portability. Arbitrarily taking away design choices in the name of architectural idealism is always a poor choice.

Why would you want to support SOAPAction or potentially other HTTP headers in a SOAP request? Well, routing is the first use that comes to mind. SOAP proponents will tell you everything you need to route a SOAP message lies in the envelope. Absolutely, but parsing XML payloads should not be done in routing tiers. Claims of high performance XML parsing still remains largely that, claims. Even if performance were acceptable, you'd have to parse the envelope twice, once to route it, and once to process it which is still more resources than you'd want to consume, if you can avoid it.

The biggest challenge I see with the current SOAP avoidance of HTTP headers is the way it has been reflected in toolkits for building SOAP clients. Most of the SOAP toolkits provide no mechanism at all for setting HTTP headers on the request. This hinders the possibility of designing SOAP/HTTP infrastructures that take advantage of the HTTP protocol to route or perform other functions efficiently in front of the service implementations.

SOAP's goal of being agnostic to transports is fine. But being agnostic does not mean it has to be impossible to leverage the underlying transport's efficiencies. It's time to recognize that message instance portability across transports while an intriguing goal, does not need to constrain the designer. If I need that capability, then I will insure my design does not leverage any transport features. If I don't however, I would prefer that I be allowed to leverage the transport. The current situation is a poor and unnecessary compromise in the specification and most implementations.

Technorati Tags: , , , , , , , , , , , ,

Wednesday, December 06, 2006

2PC or not 2PC, Wherefore Art Thou XA?

Friends, architects, engineers, lend me your ears; I come here not to bury distributed transactions, not to praise them. Now that I've thoroughly mixed and paraphrased the Bard, maybe I should get to the point of this post. It comes as a shock to many, but distributed transactions are the bane of high performance and high availability.

But, I need to insure data is committed across multiple databases? How can I possibly do that? Well, there are really two answers to that problem. The first is, most of the time you don't need the data committed. If you've determined you absolutely do, there's a better way to insure it happens.

And now you're thinking, "What do you mean I don't need it to occur within a transaction? How do you know that?"

Most transactional use cases that perform updates to multiple data sources will be manipulating data of differing importance. There is the primary data that is the point of the use case. For example, in a blog application, capturing the article is the primary data. But there is also secondary data. Perhaps the user has changed a setting and selected "Make Default" indicating they would like this new setting to be reflected in their preferences. I will assert that you should make every effort to insure that the article makes it to the database and inform the user if it doesn't. But what if the preference can't be committed because the data source is down? Do you tell the user you can't accept their article? Or would you rather commit the article but let them know the preference couldn't be saved at this time? If it was your article that you just spent 30 minutes crafting, how would you want the application to behave?

But what if you really do need to update multiple databases. You have information that must be captured. Surely distributed transactions are the only safe way to do this. And why would you not want to leverage them anyway? Well, one reason is that while distributed transactions are an easy way to insure data integrity, they are a terrible way to insure high performance and availability.

How does 2PC impact scalability? Well, as the name suggests, transactions are committed in two phases. This involves communicating with every database involved to determine if the transaction will commit in the first phase. During the second phase each database is asked to complete the commit. While all of this coordination is going on, locks in all of the data sources are being held. The longer duration locks create the risk of higher contention. Additionally, the two phases require more database processing time than a single phase commit. The result is lower overall TPS in the system.

More insidious though is the impact on availability. First, a simple lesson on availability. A system's availability will be equal to the product of each component's availability. Using 2PC will reduce the availability of an application to the product of the availability of the databases involved. For example, if an application depends upon two databases and each database has an availability of 99.5%,, the application will have no better than a 99% availability. Add a third dependency, and the availability drops to 98.5%. These numbers may be okay for certain types of applications but commerce typically requires substantially better than 99%. The 2PC coordinator also represents a SPOF (Single Point of Failure), which shouldn't be present in any highly available design.

So what can you do? Design for failure and let asynchronous recovery save the day. An approach that works very well is to use single database commits on each of the databases. These will succeed, independently, 99.5% of the time. For the 0.5% of the time they fail, capture the operation in a journal. There are a couple of options for what this journal might be:

  • In many cases, the contents of one database can be derived from another. In this case, do nothing. The asynchronous process described below will take care of everything as long as you order writes to complete the parent data first.
  • If there is no option for deriving the data, then it has to be queued in a separate fail over database. This works very well for insert operations but less so for updates. Designing for failure might cause you to rethink critical operations to make them more friendly to this style of journaling.

Asynchronous recovery is easiest to build if all operations can be made idempotent. This simplifies the recovery process and is usually not too difficult to accommodate in the database design. The recovery agent would immediately after a database is brought back online.

There is little argument that this approach is more complex than relying on XA. The gains are significant however. Each database operation is a single phase commit which is not only more scalable but also easier to debug, even in production. Decoupling the databases involved increases application availability. Designing for failures by introducing concepts like fail over journals will further increase the effective availability of your application to your customers.

Is it worth it? That's a business question, but in many cases the availability is needed and the approach I've described is the most reliable way to achieve it.

Technorati Tags: , , , , , , , , , ,

Sunday, December 03, 2006

The Perils of Good Abstractions

Good software architecture creates good abstractions. Define good interfaces between boundaries and hide the implementations is the goal of every architecture team. And it absolutely should be. Otherwise software becomes brittle and difficult to maintain and enhance. But, as in all aspects of engineering, it doesn't come without it's own share of pitfalls. This is especially true where performance and transaction scalability is concerned.

Not too long ago I was looking at dependencies between pools and databases during coupling analysis. I stumbled onto a dependency that made little sense. One of our transactional pools depended upon a batch configuration database. Not only did it depend upon it, it was generating considerable traffic to it. After some investigation, I discovered that a specific entity type had been augmented to support one specific batch application. This new subtype of the entity was specific to batch so the decision was made to persist it on a batch host. So far, so good. The problem came when the finder for the base type of the entity was modified to include this new subtype in the result set.

Everybody involved in this process had made reasonable engineering decisions. They just hadn't been aware that the public interface, which was not violated, would be impacted in a systemic way with this implementation change. This is the peril of good abstractions.

Of course system qualities should be part of a good abstraction. The challenge is how do you express those in a way that neither constrains or exposes the implementation? In the above example, this would be quite difficult. The performance of the transactional application wasn't really impacted. It was the coupling between transactional and batch subsystems that was at issue. So how does one express what couplings are permitted without effectively specifying implementation details (i.e. what resources may be used behind the interface).

You could easily argue that the new subtype should have been treated as a separate entity with a corresponding interface. And from a coupling perspective, you are absolutely right. The challenge with this approach though is you expose your partitioning challenges through the interface. The interface will now appear to have an arbitrary bifurcation because the motivator is to avoid coupling in the implementation. This is not a particularly exciting proposition either.

How did we solve this particular problem? We added a hint to the finder to indicate whether the application was transactional or batch. Is that perfect? Absolutely not, but it was a pragmatic solution that maintained a reasonable quality abstraction.

Does SOA make this problem better or worse? I'd argue there is nothing magic about SOA. Service providers are going to need very explicit about all aspects of their contract. Not just the call semantics but systemic qualities as well. Changes to the implementation that substantively change the performance or availability downward would have to be vetted with all consumers of the contract. Of course that is completely obvious to SOA designers.

Is it equally obvious to all service implementors? That remains to be seen. Especially when certain qualities are problematic to express with quantifiable metrics.

Technorati Tags: , , , , , , , , , , ,