You Scaled Your What?
"We're embarking on a project to provide us 4X improvement in scalability.", said the enthusiastic architect.
"Really, in what way?", I replied, knowing that the trap had already been laid.
"What do you mean, it's obvious, we will have 4X the scalability of our current architecture when we're done!", my com-padre stated with a confused brow.
"Is that transactional headroom, storage improvements, reduction in operations staff, or time to market for new features?", I offer, struggling to hold back a wry smile.
"Uh, well, I guess it's transactional headroom, or at least that is what I call scalability.", he offers, realizing this is probably not going where he wanted.
"Well, can you achieve your 4X scalability if you experience a 4X increase in data volume?", realizing that I now feel sorry for him.
Yes, this conversation happens far too often. Scaling systems actually require managing the scaling along all of the vectors. Ignore one and that will be the one that becomes your next bottleneck. But what are the vectors? Where is our friend, full of enthusiasm, misguided? Let's look at the minimal set of scalability vectors that should be considered in every architecture.
Transactional
This is the most obvious scalability vector. Everyone understands the concept of TPS and it is fairly straightforward to manage. However, there are some interesting challenges defining what improving scalability means. Is the improvement in TPS based on higher TPS per system or is it by supporting more systems? Either is possible but each influence other vectors (that we'll get to in a bit). How can that be? Well, let's look at these two options a little deeper.
If the approach is to improve TPS per system, that implies improving software efficiency. This would be accomplished by performing in depth analysis of the bottlenecks and improving the implementations. This will require development and test resources that could be applied to business features. During the scalability work, feature time-to-market (TTM) will suffer. Beyond that, some performance improvements will require algorithms and implementations that may be more difficult to understand and manage. Interfaces may need to be compromised to achieve the optimizations. This will lead to potential negative impacts on developer productivity.
If scalability will be achieved through improved horizontal scalability. This implies more systems that need to be deployed and managed. The systems will require operational support. The operational model will need to have appropriate management tools or the incremental cost of the additional systems will quickly strain the organization's ability to manage the site.
The point is that even the obvious challenge of scaling transactions affect other aspects of the system. Let's move on to other scalability vectors.
Data
Data scalability, like transactional scalability, is something that most organizations feel they adequately understand. All too often though all data is treated the same which leads to less than optimal data scaling. Most organizations have different classifications of data and determining the most appropriate storage and data management approaches can often lead to dramatically improved cost efficiency.
The data that is critical to the mission of the business must be protected with the most robust technology. But non-critical data, for example derived data, can be maintained by less costly technology. Creating tiers of data categories is an important aspect of providing cost effective data scaling.
Operational
Operational scalability addresses the ability to manage the software once it's developed. Operationally the software has to be provisioned, monitored, and controlled. These aspects must be included in any complete system architecture. But more importantly, the incremental overhead of adding new software and new features must also be considered as a scaling vector. Minimizing the incremental operational cost for each new increment in transaction and feature growth should be a part of every scalability project.
Deployability
Deployability refers to the ability to deploy the system in a variety of different locations and conditions. Most large scale systems will require geographic diversity for a couple of reasons. First, geographic diversity protects the business against localized disasters. As this protection is required, most organizations will want to try to leverage the diversity to provide better responsiveness to their customers. As long as you have data centers on both coasts, you might as well use those resources to reduce latency.
Software architectures should take the challenges of spanning geographies into consideration. The best architected solutions will scale to global deployments. Unfortunately, this aspect of architectural scaling is often ignored and when the need arises to geographically distribute the applications, the latency between the locations becomes an issue, often restricting the geographic span of the system.
Another deployment resource that is becoming critical to most organizations is power and cooling. Architects have not considered power consumption as one of the primary scalability vectors until recently. But if you are not looking at power now, you will be shortly.
Productivity
Productivity refers to developer productivity. Arguably this is one of the more difficult metrics to accurately measure but there are subjective ways to determine how an organization is doing on this front. There have been sufficient treatise on the topic of developer productivity and how to achieve it. I won't attempt to summarize that here. What I will assert though is ignoring the potential impact on developer productivity while attempting to improve the scalability of any other vectors discussed here will almost surely result in a decrease. It should be considered when any architectural changes are being made in the name of scalability.
Feature TTM
Feature time-to-market as a scalability vector? What kind of heresy is this? An architect grousing about how quickly those annoying features reach our customers! That's not a scalability problem! Okay, sit down, keep reading, I'll explain myself.
Business features pay the bills. The primary job of an architect is to provide the business with the flexibility it needs to respond to competitive situations. The good news is that if you do a good job on the other scalability vectors, this one will largely take care of itself. But like all vectors, you can't ignore it.
The challenge here is insuring the architecture has the ability to move into new business models cost effectively. These may be incremental changes to the current model or completely new models. The best way to do this is to pay attention to the overall architecture and not let any of the other scalability vectors become too badly ignored.
What Does All This Mean?
The point is that you have to consider these scalability vectors wholly when scaling an architecture. That doesn't mean that you will be able to scale in all directions simultaneously. Quite the contrary, you always have to give up ground on one axis to gain ground on another. But the axis you ignore is the one that suffers the most, often to the long term detriment of the architecture.
I like to use spider charts to illustrate architectural scalability. They allow you to easily visualize what aspects of your architecture are scaling and what aspects are suffering. They are also a useful tool for setting the ideal scaling model and comparing how your current model is tracking ideal. The red line on the spider graph to the right represents the ideal scaling model for a hypothetical organization. Every organization will have its own ideal scaling graph. The blue line represents their current investment in scaling the architecture. It becomes immediately obvious that transactional scalability has taken priority over other scaling opportunities.
Have some scaling vectors you think I've missed? Post a comment. I enjoy discussing the various aspects of scaling architectures.
Scaling eBay
Want to know more about how we scale eBay? Randy Shoup and myself will presenting that topic at SDForum's SAM SIG on Wednesday, November 29th. If you are in the San Francisco Bay Area, and would like to see how eBay approaches scalability, I encourage you to come to the talk.
Technorati Tags: architecture, engineering, java, performance, programming, scalability, services, software, to_read, toread, web
Sun's terms ( for architectural vectors ) :
-scalability
-maintainability
-manageability -extensibility
-security
-performance
-reliability
-stability
They seem to define scalability in terms of "transaction headroom".
The other vectors you mentioned seem to me to fit better under the other headings:
e.g.
-time to market ( extensibility ),
-operation scalability ( manageability ), -productivity (maintainability ).
However, whatever terms you use, you're basic point that they interact is well made.
Posted by:Graham | Monday, November 27, 2006 at 02:44 AM
The biggest challenge I have with Sun's terms is establishing metrics to evaluate your success. We used these when we initially began the architecture but how do you measure your progress on stability? What about security?
These are excellent systemic qualities to consider but ultimately you want to measure the success of your architecture and several of these are hard to measure. So you have to adjust them to something that can be directly measured.
Posted by:Dan Pritchett | Monday, November 27, 2006 at 07:07 AM
Dan, I attened the talk at SDForum, it was fascinating! Thank you to you & Randy and eBay for allowing you to make those slides public. Does Randy have a blog or is there any way to get into contact with him?
Posted by:Peter | Friday, December 01, 2006 at 08:48 AM
Peter, send me email and I'll connect you with Randy.
Posted by:Dan Pritchett | Friday, December 01, 2006 at 09:44 PM
Unreadable font on the graphic :(
Posted by:Kirill | Thursday, December 07, 2006 at 02:20 PM
"Unreadable font on the graphic :("
It didn't scale very well, did it? :)
Posted by:anon | Wednesday, December 20, 2006 at 05:49 PM