« January 2007 | Main | May 2007 »

Tuesday, February 27, 2007

Build vs Buy - One Perspective

I'm an engineer. It's in my blood. As an engineer, I want to build things. So anytime the build vs buy discussion arises, I have to fight the urge to say build and get right to it. I am definitely happier building my own stuff than configuring and deploying that built by somebody else. But I'm also a pragmatic engineer that accepts that I should leverage existing products wherever possible. But why should we ever build in problem areas where products exist?

A build vs buy decision is primarily about determining if a vendor product can be sufficiently customized to solve the problem that your organization faces. Products are designed to solve a wide range of problems. They have to be or the product vendor limits their potential customer base. The need to solve a broad set of problems does limit the optimizations a vendor can make. This follows the adage that anything that does several things, does none of them well.

How is it that custom built solutions can be better than commercial products though? You will have less engineering resources and less testing than a commercial product. Shouldn't it be obvious that commercial solutions will have an edge in capabilities and quality? Commercial products lack one crucial element though. A specific problem.

Knowing your problem domain allows you to careful choose your compromises. Every problem has unique constraints that can be leveraged to reduce complexity or improve performance. This is a reality that is inescapable and becomes the crux of the build vs buy decision process. For what you are balancing is the optimizations you can achieve against the resources the vendor can offer in terms of support and on going R&D.

The largest challenge that I've seen is adequately defining the requirements for your current problem and being realistic about your future needs. Comparing feature sets from a vendor with your internally developed solutions is a pointless exercise. The goal is to find a good fit, not to have the broadest set of features. The requirements can be difficult to clearly derive though. There are often second order requirements that are less than obvious but in fact lead to the largest opportunities.

As an example, when looking at messaging systems, we knew we needed reliable delivery. The major revelation however was that we did not need exactly once delivery or ordered delivery. The majority of the information we were propagating possessed inherent keys for managing idempotent delivery and there was no inter-event dependencies. There is a great deal of complexity that can be eliminated and tremendous performance gains that can be achieved when you eliminate ordered, exactly once delivery from the messaging infrastructure. Of course, this is not a good idea in general, but this example illustrates how a careful analysis and understand of the problem can lead to more clarity on the requirements.

Okay, so I'm obviously advocating build over buy. No, not really. There are several factors to consider before building a custom solution that has clear overlap with a vendor product. Some of the key factors are:

  • Business Impact - How much does the problem you are trying to solve impact your bottom line? The less impact, the less critical a highly optimized solution becomes. I know this is obvious, but it is worth stating. Optimizing marginal problems is largely pointless.
  • Incremental Benefits - How big are the gains you are likely to achieve by building your own solution vs using a commercial product. In the example above, we discovered we could reduce the number of SQL statements per message by 9X which made it very compelling. Had the improvements been 30% or less, we probably would not have embarked on building a solution.
  • Holistic Costs - It's very easy to focus on the cost associated with the primary solution and ignore the overall life cycle costs. Not only does the NRE for the component need to be considered, but also all ongoing support costs as well as the second order infrastructure components that are required to support the solution.

If you consider these (and others which I would appreciate hearing about) and still find that the benefits out weigh the costs for a high impact problem, then build makes sense. One thing that I find organizations also fall victim to, is assuming that product companies have smarter engineers. Ultimately, you should understand your problem better than anybody else. Therefore, you should be able to deliver the best solution to your problems. There are lots of reasons to rely upon vendors to deliver your solution, but presumed superior engineering talent should not be one of them.

As always, I welcome your feedback.

Technorati Tags: , , , , , , ,

Friday, February 09, 2007

Latency Exists, Cope!

I put this line on a slide recently for a presentation at work. Looking around the room, I could tell that some of the people understood, some were perplexed, and some annoyed. What does latency have to do with architecture anyway? We're concerned with proper component factoring, interfaces, and a collection of "ilities". How could latency be relevant to any of these?

In any large system, there is are a few inescapable facts:

  1. A broad customer base will demand reasonably consistent performance across the globe.
  2. Business continuity will demand geographic diversity in your deployments.
  3. The speed of light isn't going to change.

Given these facts, latency is a critical part of every system architecture. Yet making latency a first order constraint in the architecture is not that common. The result are systems that become heavily influenced by the distance between deployments and limit the business's ability to serve their customers effectively and protect itself against localized disasters.

So how do you design for latency? There are a few strategies that can be applied to your architecture that will allow you to deploy your components across diverse geographic locations. Here are the ones that I find particularly important.

Good Decomposition - Highly coupled, monolithic applications are the bane of any distributed architecture. Allowing components with little functional overlap to be coupled either in code or during deployment will pretty much kill any hope distributing your architecture across a collection of global data centers. Do it badly enough and you will kill any hope of distributing your architecture across two cities in the same state. This sounds obvious, but there are plenty of enterprise level applications in use today that have forced themselves into data centers on the far edges of the same city as their only business contingency plan.

Asynchronous Interactions - This is more than just using messaging between components. It starts by setting the appropriate expectations on your external interfaces be that SOA or a web page. Companies get tripped up here by exposing an early version of an interface that sets the clients expectation of synchronous, low latency interactions. As the interface becomes more heavily used it becomes more and more difficult to change that semantic. If the client has an expectation of a synchronous response, the likelihood of leveraging a collection of components with asynchronous interactions becomes low. Start with an expectation of asynchronous behavior and you can more readily add latency as needed to meet your deployment demands.

Monolithic Data - You can decompose your applications into a collection of loosely coupled components, expose your services using asynchronous interfaces, and yet still leave yourself parked in one data center with little hope of escape. You have to tackle your persistence model early in your architecture and require that data can be split along both functional and scale vectors or you will not be able to distribute your architecture across geographies. I recently read an article where the recommendation was to delay horizontal data spreading until you reach vertical scaling limits. I can think of few pieces of worse advice for an architect. Splitting data is more complex than splitting applications. But if you don't do it at the beginning, applications will ultimately take short cuts that rely on a monolithic schema. These dependencies will be extremely difficult to break in the future.

Design for Active/Active - If you do a good job with the preceding recommendations, then you've most likely created an architecture that can service your customers from all of your locations simultaneously. This is a more efficient and responsive approach than an active/passive pattern where only one location is serving traffic at a time. Utilization of your resources will be higher and by placing services nearer your customers, you are better meeting their needs as well. Additionally, active/active designs handle localized geographic events better as traffic can simply be rebalanced from the impacted data center to your remaining data centers. Business continuity is improved.

Latency is another example of what you don't take into consideration in your architecture will ultimately undo your design. It is one of the more difficult constraints to design for correctly. As such, it should be given more attention, early in your architectural process. Are their other aspects of this that you think are important? I'd love to hear them.

Technorati Tags: , , , , , , , , ,