I find myself looking at nondeterministic systems a lot lately. Many solutions for the challenges of extreme scale involve relaxing constraints and coping with the ensuing chaos. But humans aren't comfortable with chaos. We're wired to bring order to our surroundings. And software engineers may be more tightly wired for order than the average person.
I have been trying to get my head around this problem for a while now. Recently, I had the revelation that the amount of entropy in any software system is directly related to the breadth of the system being considered. If I start at machine instructions, the system is very predictable. I can isolate that component and define the behavior with mathematical precision. The entire component conforms to completely deterministic patterns of behavior.
But as I start looking at the system more broadly it becomes more chaotic. Introduce multiple threads of execution and I have introduced a stochastic uncertainty to my component. Behavior depends upon the randomness of the scheduler. B follows A is no longer assured. Add another processor and now B follows A or A follows B is no longer assured, as I may have A and B simultaneously.
The multi-threaded/multi-processor example is perfect to illustrate how the resistance to chaos begins. The first reaction to B failing to follow A predictably is to force it. Introduce a semaphore to ensure that A can never occur after or coincident with B. But just like reversing entropy is expensive in thermodynamics, it is expensive in software. Complexity rises. Throughput suffers. No, rather than attempting to impose order on A and B, the goal should be to relax the constraints on A and B to allow them to exist without any temporal relation. Of course this isn't always possible, but the point is to resist the first temptation of imposing order, and instead look for a solution that allows the chaos to exist.
Increasing chaos continues as the view expands. A web service can be well understood in the context of threads and processors, but the introduction of a client brings additional randomness. Clients represent events arriving at unpredictable intervals. Loads on the systems will be non-uniform. All possible combinations of requests (by type, by processing time, by memory size) will occur. The combinatorial effects are impossible to test, predict, and in most cases even reproduce. A component whose behavior was supposedly predictable becomes unpredictable as external stimuli are added.
At the highest systemic view, chaos is rampant. Asynchronous integrations, component failures, network latency variances, and a variety of other stimuli lead to a system that completely unpredictable in behavior. The larger the system, the more chaotic it will be. And just like the earlier thread example, attempting to bring order is expensive and ultimately pointless. As a friend of mine says, reversing entropy is attempting to unscramble and egg. No, chaos is the reality and rather than preventing it, a software architecture has to not only survive but thrive on it.
Which brings me to the crux of my challenges. As software engineers we are trained to solve problems in a very linear fashion. We operate within a framework of deterministic components and well understood patterns and anti-patterns. None of these are well suited to the chaotic reality of large architectures. And that's why you have to be willing to discard them. Step back and look at the architecture from a new perspective.
One of my favorite sacred cows to pick on is ACID. Along comes BASE that challenges the conventional wisdom, fails to conform to existing patterns, and arguably violates several anti-patterns. Yet, to achieve extreme scales, BASE is a necessity. And BASE is a good example of a non-linear revelation that embraces the chaos of scaling large data sets rather than trying force order into the system.
Those kinds of revelations require abandoning our preconceived notions of how to solve these problems (read, patterns) and embrace some chaotic thinking. Resist the temptation to bring order to your systems but rather, seek out ways to make the chaos irrelevant. Try thinking about the system, pushing aside the sacred cows, and envisioning what it means to have pure chaos. What breaks? How can you tolerate it without eliminating it? Is it really more complex than trying to eliminate it?
I'd love to hear your ideas about how to cope with chaos. And how you bring the concepts of chaos to your organizations.
Technorati Tags: architecture, asynchronous, engineering, events, messaging, performance, programming, scalability, software, to_read, toread