In a recent article, Mahope addresses the issue of how software efficiency is impacting data center planning. This article rang particularly true for me as I've recently been pushing the concept of optimizing TPS/watt. As the article states, data centers are becoming constrained by power. (Technically power and cooling, but reduce your power consumption, you reduce your cooling problem so I focus on power).
How big is the power problem? Well, it's big enough that vendors are starting to market their energy efficiency as a feature. It's also big enough that a new law has been passed, mandating the EPA to study power consumption in data centers (government and private) in the United States. I'll skip the comments about how that will help, but needless to say, if it made the Congressional docket, there must be something to it.
What should we be doing in our software architecture to improve this? There are some concrete steps that I have started taking and hopefully other architects will join me.
Measure transactions/watt. This is a new metric for me. But if you don't measure it, you can't improve it. SPEC has formed a power and performance committee which will be useful for an initial calibration of vendor equipment, but ultimately the transactions you need to optimize are yours. It may be hard to initially set targets for transactions/watt but measuring and monitoring the metric will certainly lead to better awareness of how the application is doing over time.
Drive up server utilization. Large deployments tend towards specialization of servers to ease with management and to segment availability. This leads to situations where server utilization is incredibly low. It's not unusual to see server pools be created to provide isolation of new service. The initial traffic volume will use 50% of one server. Availability design requires three servers to meet SLA. So you now have 3 servers running at 17% utilization, essentially wasting 83% of their watts. There is an opportunity to leverage virtualization to provide logical isolation while driving utilization higher. Alternately, M+N fail over solutions where the fail over nodes are shared across many primaries can also help.
Use deployment patterns. Standardizing the patterns you use for software deployment improves the possibility of sharing hardware amongst multiple services. Design patterns so services can be safely share one server and require components to conform to the deployment pattern.
Optimize, optimize, optimize. There are always diminishing returns for optimizing but I believe the trend for the last several years has fallen way short of those inflection points. I won't be so bold as to declare a savings that can be realized but with the millions of dollars that companies face to solve their power crisis, I will say that I'm sure that a year of runway exists just through improving software efficiency.
As software architects, power consumption is now squarely in our camp to manage. There is plenty we can do to improve the quantity of power our data centers consume. But this has to become a clear focus for 2007 and forward. This is not just a hardware problem any longer.
Technorati Tags: architecture, engineering, http, java, performance, programming, scalability, services, soa, software, to_read, toread, uddi, web
Good article! This is a topic that a lot of people do not think about. Will be interesting to see the products that come out based on this.
Posted by: pexer83 | Tuesday, January 16, 2007 at 09:49 AM
Meaningful article after long time. For me, takeaway would be "virtualization" and "process management". We have happily developed applications on 486 in past and now, we want intel core2 to develop faster? As price per transaction is going down, companies are leaning towards ignoring txns/watt.. but they want business value.
Posted by: Ashish Jain | Thursday, January 18, 2007 at 07:31 AM
Wow, Dan, I've been laughing at folks who say "yeah, we're horizontally scalable" when I ask them "how much does your scalability cost per square foot?"
Man, you make me wish that eBay had an engineering office in New Jersey. I'd totally work with someone like you. :-)
Posted by: Dossy Shiobara | Thursday, January 18, 2007 at 07:56 AM
I just wanted to point out that transactions/second/Watt is the same as Transactions/Joule.
Posted by: Anonymous | Thursday, January 18, 2007 at 08:08 AM
"So you now have 3 servers running at 17% utilization, essentially wasting 83% of their watts."
Umm.. No..
A computer does not consume the same amount of electricity when idle, or under full load. Similarly, when it is only partially loaded it does not draw full wattage.
Go. Try and measure it. ;-)
Posted by: joshW | Thursday, January 18, 2007 at 08:29 AM
I knew somebody would call me on the power utilization at load vs idle. Fair enough but there is a fixed overhead even at idle and the power doesn't reduce to 17% of max at 17% load. So there are still wasted power when running servers at partial utilization.
For anonymous, yes your unit conversion is accurate. But benchmarks tend to measure TPS and spec sheets are in watts. That makes those units far more usable.
Posted by: Dan Pritchett | Thursday, January 18, 2007 at 08:38 AM
To further clarify, yes, I made an error by stating transactions/watt. Transactions are not a measure of business work rate which was my intention.
To state it more abstractly, measure the ratio of business benefit to power consumption. The units are irrelevant as long as you can measure them.
Posted by: Dan Pritchett | Thursday, January 18, 2007 at 01:20 PM
I agree that although premature optimization may lead to diminishing returns, we as programmers could do much more to improve the efficiency of our software and, ultimately, the power it consumes.
There also has to be a commitment from organizational leadership to allocate the time and resources necessary to achieve this goal, i.e., project schedules and budgets must be adjusted. Which begs the question, who's going to foot the bill? Customers and end-users?
Ugh, this is starting to sound like every environmental debate. Perhaps this is best presented as a cost-savings issue (and it is).
It would be great to see data centers throwing solar panels up on the roof to supplement their power supply.
Posted by: Beanbrain | Friday, January 19, 2007 at 05:04 AM
This is quite an interesting discussion that often gets overlooked. Where you say something scales horizontally, you always have to ask - at the expense of what a another commentor mentions above.
I am personally leaning towards the grid concept on top of high-powered clusters. I think companies like 3tera have the right idea although I don't think the optimal pricing model is there. ergo - while you've scale horizontally via hardware, your entire software infrastructure can utilize the underlying resources more effectively and efficiently. But this again comes at the cost of ensuring the network backbone is there to support such feats.
Posted by: colson | Sunday, January 21, 2007 at 06:30 PM
Great discussion.
A while back I was reading about Google's determination of the maximum latency that they could allow to achieve user satisfaction (and more importantly, keep the user from cancelling the request).
About the same time I read that Google will typically spend more on the electricity for a server over its life than they spend on the hardware itself.
It got me wondering whether, not just system architects, but user experience architects would at some point have to consider the cost per click of their design decisions. If you are curious you can read the post here: http://limnthis.typepad.com/limn_this/2006/12/juice_per_click.html
On a related note, some of our enterprise customers are beginning to look hard at the cost of powering data centers and are asking us for advice.
I agree with your recommendations though none of them yet address the most fundamental issue in my view, and that is the efficiency of the servers themselves. Virtualization and utilization can only go far when the underlying servers are using 100 or more watts each. Kind of like telling someone with a 5 Liter V8 to make sure they take trips with the car full and accelerate gradually.
Addressing more aggressively the underlying cost to power a chip will also have tremendous aggregated value with improving the efficiency of those impossible-to-virtualize personal computers (even if an individual user could not yet care less rather their PC uses 125 watts or only 75).
The parallels with the automotive industry and CAFE rules become kind of striking in that context.
Posted by: Jim Stogdill | Monday, February 12, 2007 at 02:30 PM
See: http://www.ecologee.net the Wiki for environmental friendly IT and Hosting with green power for more information!
Posted by: nominee | Tuesday, February 27, 2007 at 12:55 PM