« Availability Enlightenment | Main | Architectural Shelf Life »

Sunday, August 24, 2008


Sri Shivananda

* Lesson 2 : Distinguish logical sharding and physical servers as separate constructs. Design for the future with logical sharding, use physical server growth when the need arises. In other words this separation allows you to design your code from the get go, but to delay acquisition and deployment of physical servers to when it is needed without having to make changes to your code. The mapping from logical shards to physical servers is a configuration artifact. This concept does not apply just to sharding but to all database connection management. Build multi-tenancy for logical shards on a physical server. This will allow you not just to expand when necessary, but also compress when Moore's law leverage kicks in over time.

* Lesson 5 : I would shard early and shard RIGHT. Prevent any need to re-do it in the future. Data migration in large scale systems that have to be online 24 x 365 is a planning, execution and co-ordination nightmare. That said :-), "unanticipated" growth can always push you down the re-sharding route. Think of it when you are writing your architecture specification and account for it, in terms of TTL ( time-to-live ) and methodologies.

* Sharding sometimes comes with the need to create secondary data structures for alternate access paths, just like tables have indexes for lookups based on alternate keys or range scans based on non-key columns. There are a few patterns in this area. If it is a rare backend use case, you can consider a "shard scan", else, you will need to create tables that contain secondary key to "shard key" mapping.

* If the next step in Moore's law for your database hardware is right around the corner and you hardware replacement cycle matches with that in terms of timelines, consider tuning ( db, query, I/O ... ) and optimization ( query count, caching ... ) rather than re-sharding.

* Mathematical sharding algorithms allow for scale and good balance across shards but not for great flexibility in re-sharding. Lookup based sharding allow for scale and flexibility in re-sharding, but, one has to pay close attention to balancing the shards.


will not key % 4 and (key / 4) % 4 always give the same value (other than key <= 4)?

P Murray

I've blogged about this excellent contribution the subject of database sharding at https://www.codefutures.com/weblog/database-sharding/

Dan Pritchett

The math question is no...

193 % 4 = 1
(193 / 4) % 4 = 0

Of course I may have forgot to mention integer math is at play.

curious #2

What would happen if we needed to scale beyond 16 shard combinations? Would you recommend 32 possible shards - basically a 2^n approach....but wouldn't this require hitting log2n DB nodes.


The comments to this entry are closed.