I've been looking at how we typically achieve reliability in architectures. In this case I'm looking more at the reliability of persistent information. Quite simply reliability is achieved bottom up. Data is stored on a SAN. The SAN uses RAID to deal with reliability of the underlying hardware. Even the underlying hardware has various error management features (parity/ECC). The SAN is connected to the database servers through redundant connections. The database relies upon a complex set of ACID rules to insure data has been committed to the SAN.
There is no argument that this works. When the database transaction completes the data is stored (at least in one location). You can rest assured that a hardware failure has not compromised the data. It is safely available someplace, although there may be a delay before it is available. But this assurance comes at an incredible cost. Starting at the bottom of the stack, RAID can reduce capacity by as little as 20% (RAID 5 on 5 drives) to 50% (RAID 1). SAN adds more cost by putting the drives on their own network. This is critical for quick migration between database servers. This cost can be easily seen in the cost per gigabyte. Internal drives on workstations cost as little as $1-2/GB while the SAN storage costs $15/GB or more.
With an order of magnitude at play in storage cost, looking at alternatives seems reasonable. Is it possible to move reliability up to the application level, thereby removing the need for ultra reliable hardware? Certainly if it can be done, the cost savings could be significant. Replacing specialized hardware with commodity hardware always brings not only cost savings but flexibility in hardware utilization.
One technology I've looked at is Distributed Hash Tables (DHT). A DHT spreads the concept of a hash table across many nodes. The key space is divided across nodes making the table resilient to single node failures. Part of the key space may become unavailable but the remaining key space survives. The implementations I've looked at (Free Pastry and Bamboo) actually replicate keys across multiple nodes so a single node failure does not compromise the availability of keys, only the transactional capacity of that particular key partition.
DHT's are particularly interesting because the distribution and replication can span data centers as well. From an application level, this is useful as it solves disaster recoverability and makes the application more resilient, not only to individual hardware failures but even localized disasters that might take an entire data center offline. Solving both node level and data center level resilience with a single solution is a rare win.
In the case of DHT's this comes at a cost though. Access times are notoriously slow range from 10's to 100's of milliseconds. This creates challenges for many applications and may in fact make a DHT solution untenable. But that doesn't mean the whole concept of application level reliability is invalidated. In many cases, there are optimizations that can be made at the application level that will still permit the problem to be solved and preserve the use of commodity hardware.
Comments and ideas along this idea are most welcome!