Sunday, October 12, 2008

Television for Software Engineers

No, this isn't going to be an article about concrete architectural practices. But it may in fact be as useful as any of those. I watch some television. Okay, sometimes I watch a lot more than I should. But mostly I listen to it while doing other things, like now as I write this article. And somewhat surprisingly, I find some television to be incredibly applicable to the field of software engineer. More interesting is that the programs I find applicable have absolutely nothing to do with software or in most cases even computers. Here's programming I find very insightful.

Detective Programs

Some of these are much better than others. All software engineers are inherently intrigued by a good puzzle. Unfortunately most of the remaining population isn't so too many detective programs turn into juicy dramas with lame intrigue instead of good intellectual challenges. My two favorite by far are Columbo and Monk. What these two programs share is the reliance on incongruities in details to lead our hero to the solution. How does this apply to software? Well, debugging hard problems is almost always about paying attention to seemingly inconsequential details. Inconsistency in behavior in an unrelated flow is often the key indicator to the root cause.

Mythbusters

I enjoy Mythbusters immensely. Yes, it is truly geeky fun. And they get to blow real things up, not just simulated explosions in mathematical models and 3D renderings. But what is a true joy to watch is how thoroughly methodical they are in solving problems. They research, design, and prototype. They fanatically analyze their results and make adjustments based on the data. Few programs have been bold enough to expose the analytical and development process so transparently. In fact, the boldest thing Adam and Jamie do is solve a problem in front of millions of people, knowing that they will receive hundreds of comments on the work they do. Think about it. Would you work on a problem in front of millions?

Modern Marvels - Engineering Disasters

This series fascinates me. I'm addicted to it. I wish they would make more because I've seen all the episodes at least 3 times each. Two things stand out in this series and is a common theme across the more than 60 disasters they have presented:

  1. Catastrophic failures are always the result of compounding problems. They come about as the result of a "perfect storm". Nobody believed that the combination of events could occur within a critical time window so nobody planned for it.
  2. Engineers are an egotistical lot. We are sure we got it right and only when our creations collapse in front of us do we realize we missed something. It's not surprising though as creating what we create from nothing more than thought and will does require a good deal of egotism.

Every engineer in every discipline should watch this series. It gives you insight into the thought process required to make your creations more failure resistant. And you can see what happens when you fail to account for not just a collection of single failures, but for the simultaneous occurrence of these failures.

So that's my collection. Care to add your own?

Technorati Tags: , , , , ,

Saturday, September 20, 2008

Focus on the Cloud, not the Clouds

There are a lot of very good conversations going on about the challenges with cloud computing. Storage is just beginning to mature in the cloud and there are many interesting issues around privacy, SOX, and PCI compliance. Nobody has clear answers yet on the security and compliance related issues. But I think for many enterprises, that may be a pointless dialog because it isn't federating their capacity into the clouds that is important but rather how to leverage the concepts and technologies from the cloud to improve their operational efficiency.

Before getting into complex discussions about the security challenges of using the cloud, I believe most organizations need to take a good look at how well their applications can operate in any utility compute model. Does your application currently support all of the following concepts easily?

Loose Coupling to Platform

Are your applications platform independent? Have you made them completely agnostic to the processors they require? Do they scale well from a single core to several cores? Have you optimized your memory footprint, making it fit well into a modest memory model? What assumptions do your applications make about operating systems? Directory paths? Available local disk space?

Ultimately, can you run your applications in a collection of virtual containers in your environment right now. The answer for many organizations is no. And until that is fully addressed, their ability to enter in cloud is largely non-existent. Forget about your PCI issues, your JVM footprint is keeping you stuck on your own hardware.

Crisp Packaging

Do you have a repeatable build and release process? Are your build targets well defined and self contained? Do they externalize their dependencies for tools to detect deployment requirements? Are the artifacts independent of their deployment environment (development, testing, production)? Are all environmental dependencies externalized and injected into your application at run time?

To leverage a utility compute model, you have to have components that are compatible. Everyone likes to draw parallels to the electricity model when discussing clouds. But imagine if every electronic component you purchased had different voltage needs. Failed to clearly identify their current needs. Forget about the obvious each coming with their own style of plug. We wouldn't have the ubiquitous electric utility model we have now. Yet few organizations are far off this analogy when it comes to the state of their applications.

Well Specified Management Interfaces

Do you have standard interfaces for expressing state of your application? How about for changing runtime behavior? Or detecting exceptional conditions? Or tracking activity for both debugging and analysis?

This is something I've covered in past articles but is worth emphasizing again. Operational control of the application requires a set of interfaces that follow the same rigor as your other API's. They need to be designed and evolved using the same principles and processes. And without such interfaces you will have a difficult time operating components, in your own data center much less in a compute cloud.

In Conclusion

The interesting part of this situation to me is these are all things you should be focused on accomplishing whether you are looking to leverage a vendor's cloud or not. These allow you to achieve lower cost of operation and better utilize your internal resources. And they do that without having to bring into question the challenges of privacy and security.

Technorati Tags: , , , , , , , , , ,

Sunday, September 14, 2008

Are Data Warehouses Dinosaurs?

As anybody that follows my blog knows, I am not a fan of vertical scaling. I don't like solutions that can only be implemented in a single address and storage space. Unfortunately, there are analytical problems that need a holistic view of data. This is very typical of data warehousing applications. As a result, data warehouses are expensive, often out of the reach of smaller organizations. But there may be an alternative that is less expensive and horizontally scalable. What is this great revelation? Processing streams of events using an Event Stream Processor (ESP) solution.

ESP analyze streams of events using a language similar to SQL. In the same manner that databases and data warehouses use SQL to perform analysis of data tables, ESP use their query language to analyze streams of events. The simplest way to understand ESP is to think of events as rows in a table and the attributes of an event as the columns. Each event type is the equivalent of a table. From this perspective, it becomes straightforward to see how ESP works. But how does this relate to replacing data warehouses?

Data warehouse analysis involves aggregating information along a variety of axis as well as inverting relationships in the data. The goal is to provide the business with different perspectives on what the customers are doing. In order to do this, data is loaded into the warehouse periodically. Typically daily ETL processes are performed on the production databases to keep the warehouse fresh. This process though has a couple issues beyond the cost of the warehouse infrastructure. First, the ETL places a significant load on your production databases. If your business has nice offline windows for the ETL, that's great, but if not, managing the scale becomes a challenge. Second, the freshness of the warehouse is typically 24 hours behind or more. As your business grows this lag will grow as well.

ESP address this by analyzing the changes to your data as it occurs. Rather than doing batch ETL's, you stream business events as the state of your data changes. This creates a more manageable scaling model for your production system. The business analytics extracts are spread throughout the transaction day. ESP can also be horizontally scaled, providing a more cost effective solution for your business. And since ESP is performing the analysis in real time, the business metrics can be current and remain that way as the business grows.

Does this spell the end of data warehouses? Well, maybe but there is one challenge with the ESP approach. While it is able to provide analytics cost effectively, it does not provide the ability to perform historical analysis. If you know what you want, then ESP will deliver the results from the current point in time forward. But what if you want a different perspective on your business activity and you want it over the past 3 months. One solution is to create a framework for capturing and replaying transactions but this can be expensive. This becomes a matter of deciding the business value of performing the historical analysis.

Whether you choose to use a data warehouse or not, ESP is definitely worth investigating as a way of delivering business analytics more cost effectively.

Technorati Tags: , , , , , , , , ,

Monday, September 01, 2008

Free Energy

The first law of thermodynamics tells us that energy in a system is constant. Energy is neither created nor destroyed. And thus the fantasy of so many that you can get more energy out than you put in is dashed. Of course there is the Adam Savage comment that there is really only free energy to you. Meaning of course, it's free if you don't have to put your energy into extracting it. Of course there are ways to get free energy without siphoning your neighbor's gas tank at night or covertly tapping their power lines. I have a few favorites that illustrate the concept.

Regenerative brakes use electric generators to provide resistance and slow a car. Or in some cases, they compress hydraulic fluid but whatever the technique, the idea is the same. Rather than converting the vehicle's momentum into heat and annoying brake dust that cakes on your wheels, capture it and store it for reuse. Is it free? No, but it was otherwise wasted. In this particular application, the regeneration process doesn't even have to be especially efficient. Even if it only captures 10% of the vehicles kinetic energy, it has reclaimed something, not to mention reducing brake dust. Have I mentioned I'm not fond of brake dust?

There are two intriguing proposals that I have also seen. One involves a floor mat that contains coil inductors. Stepping on the mat depresses causes a small amount of electrical energy to be generated. The volume is relatively small until you consider an application such as a train station or airport terminal where thousands of travelers pass by everyday. The efficiency isn't great but the cost of the generation source is largely free. It's not clear if it can be manufactured cost effectively enough but the concept is spot on. Find a source of energy that is otherwise being wasted and capture it. Along a similar vein is a system that generates electricity from the heat put off by people in office buildings. Again, find a source of energy that is otherwise discarded and convert it into usable energy.

By now you're probably wondering what any of this has to do with architecture. Well, power efficiency is definitely one of the challenges we currently face in our architectures. As I have said before, power consumption is a software architecture problem. One of the interesting challenges I have been pondering of late is how to find largely idle resources in the data center and put them to use. Gain some more business benefit from the capital expenditures and electricity bill companies are already paying.

One of the idle resources I've observed is disk capacity. The traditional web/services architecture follows a multi-tier design with application servers and database servers. The storage is of course on the database tier. But what about the drives in the application servers? Disk capacities have grown to almost ludicrous sizes with most 1U servers arriving with 500GB or more of storage. Even with virtualization, the operating system and applications rarely occupy more than 100GB, but let's be generous and make it 200GB. So you have your application servers spinning drives with 300GB of unused capacity. Even a smallish site of 100 application servers has 30TB of idle capacity sitting in their application servers.

So let's think about that capacity for a minute. The capacity isn't reliable as application servers are typically stateless. So any use of the storage must take that into consideration. Not only can application servers go down, but they can die and be completely replaced, the contents of the drive lost forever. Additionally, if your application servers are located in a single data center, not just one server's storage but the entire storage farm could be lost. Let's assume though that we're only going to worry about losing a server and not the entire storage farm.

There are a lot of uses where losing storage is completely acceptable. Log file storage, temporary space for back office analytical processes, and test data sets are just a few examples. But what if you ran Hadoop across these storage nodes with 2 or even 3 replicas. You would still have 10TB of storage available and you can now tolerate the loss of a single node without any impact to the availability of the data. Most companies have storage needs that could easily be met by such storage configuration and it is almost free. At the extreme end of this concept is Wuala which allows users to not only access a distributed storage network but trade idle capacity on their computers for additional capacity in the network.

Another idle resource I've observed is off peak processing power. A typical web service has a peak that is approximately 2.5 times the average traffic. Ignoring disaster recovery capacity and peak to off-peak, that means that there are windows throughout the day when 50% of the companies available compute resources are lying idle. This is capacity that has already been purchased and is occupying data center space and consuming power.

Again, there is an opportunity to harness this capacity for other business functions that can live in the off-peak processing windows. What if business reporting activities could be moved into another virtual container? Or how about business analytics functions being performed as a map/reduce operation during off-peak cycles? Every organization has dozens if not hundreds of such processing tasks that are traditionally assigned to dedicated hardware that could easily be completed on otherwise idle resources.

And so bringing this full circle, just like regenerative braking exploits energy you've already paid for to reclaim benefits that would otherwise require additional new energy, harnessing idle capacity from your application tier can let you reclaim business benefits for resources you already fund.

Technorati Tags: , , , , , , ,

Sunday, August 31, 2008

Extending the Architectural Life

In my last article I introduced the concept of Architecture Shelf Life. I also put forth the postulate that for most patterns and technologies it is approximately 5 years. If you buy that as well as the premise that architectures can only survive 2-3 shelf life refreshes before they become dated and impractical, then you must be asking, how can I hedge against that. I want to extend the life of my architecture in some way.

Most critical for any architecture that is to survive is the ability to be refreshed with new technologies in an incremental fashion. The way this can be most readily achieved is by following standard architectural principles of good component design with the loosest possible coupling between components. This obviously allows components to be implemented independent of each other and also lets them follow their own technology curve for being replaced. Unfortunately though architectures rarely have good components that are truly loosely coupled. The reasons for this are the results of organic growth of the system. I have noticed some specific patterns emerge though.

Poor Protocols

This one is probably one of the most common. The components are reasonable and the interfaces have decent semantics. But then the implementation choice couples the components to a specific technology or implementation strategy. In most cases the decision is made based on the desire to optimize the interface speed. And in the majority of these cases it was probably an unnecessary optimization that bought a little speed but at the expense of long term architectural flexibility.

Interfaces between components should be text based (e.g. XML, JSON) as they offer the maximum flexibility for migrating components to alternate implementation technologies. The state of the art for standard text format parsers and generators has reached sufficient efficiency that the overhead introduced is nominal compared to the processing time of most components. We have now reached a point in text formats that we can move past the efficiency debate and focus on the architectural benefits we gain from decoupling our interface protocols from an implementation strategy.

Seeds of Destruction

The business needs a simple new feature, say a trivial fraud check. So it's added to an existing component. It works great but now they need a few more capabilities. So the component is enhanced. This cycle continues for 18 months and suddenly you have what really amounts to two components, but they are implemented as one, badly coupled and intertwined. Unfortunately the problem is usually worse than that though as the customers have come to expect a behavior that will be hard to maintain if you decouple. Either because of latency or availability or both, you find yourself with an architectural conundrum that could have been avoided if you had maintained a separation of concerns, regardless of the relative size of the two concerns involved.

Unintentional Vendor Lock

How can it be unintentional? You know, that's a question I often have asked as well. But the reality is that it can be done very easily. I have nothing against Oracle per se, but it will serve to illustrate my point. Oracle provides a few unique features that are in some cases tempting and in other cases unavoidable. Anonymous PL/SQL blocks are stored procedures that are stored in your code, delivered at run time. This is great from an application maintenance perspective because you get the benefits of a stored procedure without the revision management issues. But this feature is unique to Oracle and as far as I know, one competitor (Enterprise DB) which means if you choose to leverage it, you have to devise solutions if you want to migrate to another database.

But that feature is more overt and you can choose to not use it. A more covert feature is how Oracle manages concurrency. It relies on rollback segments that provide read committed concurrency while avoiding row level locks for the duration of the transaction. This provides excellent performance, especially in high concurrency situations. If your update patterns are not particularly contentious then the difference in performance between normal row level locks and rollback segments may be nominal but if you do have records that receive high update loads, you will experience a drop in performance.

The point of this though is it's important to understand the behavior of your vendor's products and understand the implications of how they will impact your architecture, short and long term. It's not just the over features you can avoid, but the more subtle implementation details that you can't avoid but may come to rely upon.

Persistence Binding

This one is a bit harder to avoid but is worth the investment to minimize. Until recently most architectures considered persistence to mean database. Many architects would apply best practices to avoid vendor lock but little to no effort to avoid database lock. And why would you be worried about database lock? Well because for many classes of persistence, a database is not the most cost effective storage. It may be the easiest to initially implement but as other forms of persistence mature, you may want to be able to take advantage. Yet if you've assumed you have a database with SQL available you may find this difficult to do.

One of the best ways to minimize persistence lock is to separate your access paths in your resource tier. The primary access path to any entity should be via primary key. All other access paths should be added with care and carefully delineated within the implementation of the resource tier. This allows the alternate paths to be managed in other forms of persistence in the future.

I am sure there are other suggestions from my readers. The general theme, as you can see, is to minimize coupling. This keeps your architecture flexible and leaves the door open for integrating newer technologies and patterns as they emerge.

Technorati Tags: , , , , , , , , , , , , , ,

Architectural Shelf Life

Architectures are often thought of having an useful life but that life is usually hard to predict. Most of the time it is determined by how much pain an organization is experiencing in trying to extend it to the current business needs. After giving this some thought, I think I can state that an architecture typically has a 10, to at most 15 year useful life. To explain this idea though, I need to introduce another concept:

Architectural Shelf Life - The duration that a collection of patterns and technology are applicable when starting a new system design.

So to elaborate, if you have the chance to start over, complete green field, what architectural patterns and technology do you use. I argue that these change about every 5 years. And from that I derive that any architecture should be replaced in part or in whole about every 2-3 generations of this shelf life. Not convinced of the shelf life argument?

Roll the clock back to 1990 and look at how enterprises would deliver services to their customers. Most offered no access from the customer's location. Some had basic communications in place via email. The more progressive may offer forms via Compuserve, Prodigy, Delphi, or AOL. Applications were implemented monolithically, scaled vertically, and most likely in languages that are waning in popularity or possibly already dead. Databases were also monolithic, using mid tier to mainframe servers with direct attached storage.

By 1995, your customers could begin to find you on the web. The form screens were literally translated to web forms. The application stack and data storage had not changed much. The web was primarily about user interface. Upstart competitors are challenging that with applications written in C++ or some fledgeling scripting languages. Their databases were still largely the same platform as 1990.

If you weren't too busy fixing Y2K bugs coming in to 2000, you would see a significant acceleration in change. Pure web architectures are now common, though the web fronting legacy enterprise applications is still prevalent. Applications are now multi-tiered although still largely monolithic deployments. Scale out architectures are emerging as the preferred way to scale out business logic. Mid tier servers are taking over the job of databases from mainframe and SANs become the preferred way to manage storage. Applications were being written in Java, VB, and new language C#.

Over the past 8 years, databases have evolved to horizontal scaling through sharding. Services have emerged as the preferred design pattern for integrating tiers and components. Java and C# frameworks have matured to provide dramatically better productivity and have been joined by Ruby on Rails and Django to name a few. Distributed storage on low cost unreliable devices is gaining popularity.

Of course some of you will immediately jump to correct my memory or facts. That's not the point but rather the dramatic shift in architectural patterns and technology is what you should take from this. Additionally, looking at the business players that have emerged and those that have faded due to the shifts in technology gives a chilling picture into how much technology disrupts the world of business. The improvements in developer efficiency and the lowering costs of deployment makes it possible to build and operate products cheaper every 5 years. Of course there is a lot more to a successful business than the operating cost but facing competition that has a better cost structure is not a desirable situation.

The challenge you face then is how to continually evolve your platform to adopt new patterns and technologies as they emerge. This can be a daunting when you have a well established customer base and a overflow of business features in the pipeline. Operations will be concerned about maintaining availability through any transition and the business wants enhanced functionality to the product. As long as those two goals can be met with your current architecture, it takes considerable momentum to cause a shift. And in fact, even when it becomes questionable as to the effectiveness of the current architecture, it is not uncommon to still meet resistance to making a major architectural shift.

The fallacy in this situation is that a new architecture will be to disruptive and costly. What is missed is that the new patterns and platforms can be used to disrupt your business anyway. If you have a successful business, many others want a share. And technology becomes the tool they can use to go after your business. And the disruption they can cause by offering your services at lower cost or with more compelling features is far greater than any disruption that will be incurred for internal architectural shifts. In simple terms, if you don't use technology to disrupt your current way of doing business, somebody else will. Therefore, incorporating architectural shelf life and therefore architectural life span into your business is a necessary investment to remain current.

Technorati Tags: , , , , , , , , ,

Sunday, August 24, 2008

Shard Lessons

No, not SHARED lessons, I mean SHARD lessons. I have to admit that until about a year ago I didn't really know the term shards in relation to databases. Now don't confuse that with not understanding how databases can be horizontally scaled. I was introduced to that concept and helped to define the various ways it can be done but we just called it splits. Regardless of what you call it, there are some interesting challenges that are introduced. The well known challenges of consistency are discussed ad nauseam, even by me, so I'm not going there with this article. But besides that, there are some other lessons to learn when applying the pattern to your data.

Lesson 1: Right Size Your Shards

Sounds like a fast food commercial when I put it in those terms but the idea is actually the same. Determining the initial number of shards can be tricky. You don't have infinite resources and no matter how good your tools are, more shards are more problematic to manage than fewer. Yet you also want to select a number that going to last you for a while. I would recommend picking a number that will give you 18-24 months of growth with a margin for safety.

Simple enough, right? Well, not quite yet. I also wouldn't start with a single digit number of shards even if that will last you for two years. Why start with 4 when the incremental overhead of 12 is manageable. You allow a longer growth path and by hosting multiple databases on the same physical machine, you limit your hardware expenses. Of course then why not 24 or more? Well, at some point you add unnecessary overheads and get diminishing returns.

Lesson 2: Use Math on Shard Counts

If you notice the numbers I selected in the previous lesson, they are multiples of 12. Why that and not multiples of 10 or 7 or just whatever number happens to be your lucky number? Well, let's say you are going to run multiple shards per physical machine. You're a small organization. You pick 10 as your number of shards and spread it on two machines. Five each, things are perfect. Now you find your growth means you need more capacity. Would be nice to add one machine, but that leaves you with a 3, 3, 4 split and one machine has 33% more load than the other two. In fact, your next physical growth is 5, which may not be terrible if you are running your databases on commodity hardware, as you should!

But had you started with 12, you find your growth options to have fewer cost step functions. You start with 2 boxes, 6 shards each. Move to 3 with 4 shards, then 4 with 3 shards, and finally 6 with 2 shards. If you had projected 12 shards would allow you to scale for 2 years, you can see how you are able to grow your hardware requirements in a smoother fashion than with 10 shards.

Lesson 3: Carefully Consider the Spread

The way you spread the data across the shards needs to be determined based on what you are hoping to optimize. Certainly the simplest strategy is a uniform distribution which can be achieved using simple modulo math on the primary access key. This approach requires very little to implement and works well when the primary goal is to evenly distribute the load and you have very few access paths.

There may be times however where you want different locality policies. For example, you may want to keep all members of a group on the same shard. This can improve performance for applications that tend to operate upon a group. It also introduces challenges in balancing loads as some groups may be larger or more active than others. The benefits of clustering by a group must be weighed against the challenges of balancing work loads.

Lesson 4: Plan for Exceeding Your Shards

No matter what number you pick, the possibility of reaching capacity of your databases is very real. You selected a shard pattern to allow for a scale out strategy. So as you reach the limits of your current hardware you really don't want to shift to s a scale up on those shards. That's why it is important to bake into your shard strategy your scale out plan.

From Lesson 3, if you are clustering by group, adding more hosts to support new groups is probably straightforward. This helps when your scaling challenge is a growth in the number of groups and not just a few groups growing larger. Shards that cluster based on a group will probably need to support a scheme to migrate groups to allow capacity to be more readily managed.

Uniform spreads can be expanded through another trick of math. Each shard can be treated as a root node in a shard tree. When it is time to grow the capacity, another layer of shards can be added to grow capacity geometrically. For simplicity, let's say you originally started with 4 shards. You calculate the shard number with the following formula:

shard = key % 4

Now you need to scale. Let's say we will expand from 4 to 16 hosts. The technique for computing the root shard and final shard is:

root = key % 4
leaf = (key / 4) % 4

You now have 4 * 4 or 16 possible shard combinations using root.leaf as the shard identifier. Furthermore migration can be done online simply by following the scheme of checking root.leaf first and if you do not find your record, check root. Appropriate locking mechanisms will obviously be required by migration scripts to insure that no data is lost or corrupted. And this technique does require a temporary increase in the number of database instances to (old shard + new shard) hosts (in this example, 20 hosts total).

Lesson 5: Shard Early and Often

No matter how disciplined you think or hope your team is, if you give them a monolithic schema, and tight deadlines, they will eventually create a mission critical query that assumes two rows are on the same physical host. And, in my experience without fail, those two rows can't possibly live on any sane shard spread strategy. So if you have any hint at all that you will need to scale out a schema, do it early. The longer you wait, the more your application will depend upon a single schema instance and the harder it will be to migrate to a shard schema.

Care to share lessons you have learned from shard patterns? I am sure there are more that are worth knowing!

Technorati Tags: , , , , , , , , , ,

Monday, November 20, 2006

The REST Dialogues, A Real eBay Architect

In Getting Data | The REST Dialogues, Duncan Cragg conducts an interview with an imaginary eBay architect. While I don't play one on TV, I am a real eBay architect and would love to participate in this dialogue. As the first two parts are complete, I felt I should post my follow ups here and hopefully invite Mr. Cragg to conduct the remaining 7 parts with me. I must make the standard disclaimers though. I am not speaking for eBay. None of my comments reflect on current or future products. This is purely a technical discourse on the merits of REST vs SOAP styles of interaction, whether eBay ever chooses to offer such an interaction or not.

Duncan Cragg: So - let's get straight to my argument: I claim that your SOAP APIs, as instances of the SOA style, won't scale or interoperate as well as they would if they were implemented in the REST style. Which, in the form of the Web, has largely proven scalability and interoperability.

Dan Pritchett: The scaling argument is an interesting position. Most of the data that would be returned by eBay interfaces will involve structure that is best captured in XML. From a scaling perspective, XML is XML. Parsing is definitely more expensive than generation and there is little argument that REST can reduce the parse load placed on our resources but this is only a portion of the overall processing load.

Interoperability would depend largely on the relatively similarity between the eBay entities and other Web 2.0 entities. To the extent there is overlap, I would concur that a standardized format improves interoperability. I would also assert that the most interesting entities at eBay are unique to eBay.

DC: That's true now, but if the Web 2.0 vision comes together, you may care: your API traffic could increase dramatically. It would be better to be the one prepared for the scale of the API-Web!

Can you really argue in your company that you don't need to be scalable? What if your port 80 traffic needs to be routed to your APIs for some reason?

DP: While my imaginary co-worker may state that scalability is not a concern for eBay, I would never make such a claim. Scalability is always an architectural consideration. Rather than expecting that port 80 traffic would be routed to the API though, I would expect that future traffic growth might come from applications that leverage the API.

DC: As for interoperability, you could be excluded from Web 2.0 industry-boosting consortia, or excluded from perhaps hugely popular Web 2.0 applications in the future...

Interoperability raises the level of the market as a whole. Market players shouldn't differentiate on what's common to them, they should differentiate on the level above.

It also depends on the value you place on having happy customers who don't have to do the same thing multiple ways or multiple times.

DP: This comes back to defining what operations and entities are common between eBay and other market players. There are probably subsets of entities that are common (e.g. messages or users) but even in that context, there are structured components that require extensions from a common base entity to prove useful. This becomes the substance of the conversation and I also believe the largest challenge that the semantic web currently faces.

DC: OK, let's look at your SOAP API. There are 72 function calls in there that begin with 'Get'. Each one specifies a particular piece of data that you can fetch.

DP: Go on

DC: Sure, but you don't need a new function call for everything you can get from your system: you can just use HTTP GET!

DP: Sure, I just need to parameterize the GET operation to differentiate what data you're requesting. But aren't we largely debating syntax and mechanism, not semantics?

DC: It's not just any 'data going in': the URI can be passed around for anyone to re-use. This URI is more interoperable because so much deployed software understands it. No-one understands 'GetSearchResults()'!

DP: Okay, fair enough. URI oriented requests can be more easily saved and shared.

DC: Another example of how the URI can glue things together is that the data returned from your GETs can have more URIs in them, ready to go! You won't get data from your Web Service with 'GetItem()' in it..

DP: Another fair point.

DC: REST also talks about the formats of the data behind a URI. In a GET, the response data is given a Content-Type, and there's an expectation that clients will understand the types of data being returned: interoperability comes from broad standardisation of return data.

DP: But now we're back to the point I've made earlier. Standardization assumes common entities. We certainly have entities that can be declared common at a high level (e.g. users, products, messages). These entities become somewhat less common as you dive into the details. We also have several entities that are not common (e.g. items, bids).

DC:The explicit statement of Content-Type reflects a culture of agreement forced by the sharability of URIs: your URIs are more sharable when more clients understand the data they dereference to.

On the other hand, the culture of SOA is to declare custom WSDL and custom XML schemas.

Like I said, one day you may care about interoperability, and having an architecture that puts a high value on content type and schema standardisation, as REST does, puts you one step ahead.

DP: So the suggestion is that all vendors are going to agree on a common set of entities and their detailed schemas? I suppose that might happen for a subset of the entities but I think even that will prove challenging. I was at Sun in 1992 when we proposed an industry standard format for calendar appointments. Fourteen years later there are still competing standards and trying to give a Mac iCal appointment to an Outlook user is harder than it should be.

If there are REST standards around the entities that we publish, then it would make sense for us to consider them. To the extent that our entities are unique, then isn't the format we publish the standard by definition?

DC: You can also gain scalability by partitioning on those URIs.

DP: We partition along many dimensions, URI's just being one of them.

DC: Yes, but URI partitioning cuts right through the system in a very simple way: your partitioning is an application-specific optimisation which has to be hand-coded behind the SOAP interface.

DP: Our partitioning doesn't follow the model you've imagined but I can't really share all of our partitioning magic with you.

DC: Another benefit of using HTTP over using SOAP is that you get cacheing built in to the architecture, which you can start using as soon as you ask for it in the headers. This boosts scalability.

DP: Caching dynamically generated content is considerably more difficult than you think. There are portions of our results that can be cached but rarely the entire result set from a single request. We already to caching where it can be done and still provide correct results to the interface. Bear in mind that you are talking about a system with more than 5,000 state changes per second.

DC: Which is where you're potentially inefficient.

DP: Correctness must always override efficiency, especially where money is concerned.

DC: Again - it's application-specific.

So - even in the simple cases of fetching data, REST has given you much greater scalability and interoperability than your SOAP interface - as well as a simpler, more generic approach.

DP: In many cases our caches have to be application specific. The correctness of the data can only be insured by understanding the logic used to generate it. We've studied caching opportunities extensively and apply caching where it can be done safely, with no risk of producing inconsistent or incorrect results. REST isn't going to change the business rules or our customer's expectation of accuracy.

DC: And we're only one-ninth of the way through our conversation!

DP: Great, this has been fun!

Technorati Tags: , , , , , , , , , , , , , , , ,

Saturday, November 18, 2006

What Is This About?

The name of this blog and the quote by Antoine de Saint-Exupery is a philosophy that I have followed most of my career. It carries over from work to the rest of my life as well. Albert Einstein said it a different way, "Everything should be as simple as possible but no simpler". The phrase "Add Simplicity" is paraphrased from Colin Chapman, father of Lotus cars, who stated that they work to "add lightness".

The reality is that when engineering anything, your job is to add simplicity. While it sounds like an oxymoron, the reality is that the easiest designs are the most complex. Finding simple solutions is actually hard work. This is one of the hardest lessons for fresh software engineers to learn. Everything has a simple solution, finding it is hard.

I hope to post on a variety of engineering topics. They will vary from design philosophies to opinions on emerging trends. Throughout though, I will stick to the idea of "add simplicity".

Technorati Tags: ,