Monday, March 24, 2008

Software Is A Craft

I've been writing a lot of code for the past 6 months. I have to admit, it is a welcome change from Word documents and PowerPoint presentations. Java, unlike text and graphics has a clearer definition of correct and finished. But, like text and graphics, the definition of done isn't finite. There is usually a bit more polishing you can do. Another way to factor the code to make it a bit more tidy. And that's something I've been doing a lot of in this 6 month period. Factoring, rearranging, trying to follow the spirit of my blog, add simplicity. It's done when there's nothing left to remove.

I've felt for a long time now that we don't view software engineering correctly. Even that term, "software engineering", illustrates a root cause of the problem. Engineering, in all disciplines, is viewed as the opposite of artistic. I've had these conversations with my "creative-type" friends and it's a stereotype that is common. Yet I view all engineering disciplines as art with constraints. Just like a photographer is bounded by the limits of the lens and the film (or digital sensor), software engineers are constrained by the boundaries of the programming language and the processing power of the target system. But that doesn't make software engineering any less creative or artistic.

Given that though, we teach software like we teach math. Artists are taught not only about the mechanics of their craft but also the art. We stop software engineering after the mechanics. Languages are covered in detail. Code factoring is taught with precision, not as the soft skill that it is. And then we enter the workforce. If we're lucky, we run into some mentors that help us learn our craft. If we're not, we bang out code to project deadlines and while that gives us experience, it doesn't give us skill.

Getting a chance to focus on writing software for the past few months, gave me a chance to understand this reality. I do not make clay pottery but I have friends that do. And what they relate is a feel that you develop for the clay. You come to know when the clay is the right consistency. How much pressure to apply to achieve the desired thickness. And this can only be learned by doing. This is the way software is for me. There's a feeling when the software is coming together well. It isn't something that can be expressed in precise terms, codified into a set of steps. It's an emotional response to the components falling into place cleanly. In other words, it's art.

I thought about how I learned to write software. I was fortunate enough to be exposed to some very good software engineers. They guided me early in my career. Helped me understand the difference between okay, good, and great software. In other words, I spent time as a journeyman. Given problems to solve, code to write, and usable feedback on how I could make my software better. Of course I also learned a lot the hard way. Ideas that seemed good at the time, only to discover they had a serious shortcoming. And of course, I rarely settle on a solution to a problem until I'm completely satisfied with it. We all see the same problem repeatedly through our career and I take that as an opportunity to continue to adjust and refine the solution until I know it is solved as cleanly as possible.

And of course, that is what an artist does. Express oneself in a way that provides satisfaction. Refine the expression until it is succinct and yet powerful. This is what software should be. Concise solutions to business problems that can illicit a positive emotional response from others in your trade. If you don't have emotional responses to software, perhaps this isn't your craft. For that is the only way you can move past just creating code to crafting code.

Technorati Tags: , , , , , , ,

Wednesday, October 24, 2007

What Metadata?

I'm positive that is what operations teams often think developers are asking themselves. And what exactly is metadata? Well, the simple answer is any information that is outside the main data flows that is used to control or monitor the behavior of the application. And with an answer that vague, it's little wonder it's often not part of the initial design. I break metadata into two major categories. Configuration data that controls the behavior of the application. And telemetry data that is generated by the application.

Configuration Data

Configuration data is usually given little if any attention during application development. Just throw whatever you need in a property or perhaps an XML file. The file is picked up from somewhere in the file system and that's that. Unfortunately, this fails to take into consideration several factors that can impact the manageability of the component. Let's take a look at some of the common pitfalls.

Configuration vs. Code

This is one of the biggest issues I see when designing configuration data. People get confused that properties and XML files may contain what is essentially code. Anything that would only be changed by a developer, is code, regardless of what file format is used to express it. Making files that are essentially code available for potential modification as configuration in a production deployment is a formula for disaster. Code must be tested before released to production. You can't test code that you allow to be changed in production. Therefore, files that are essentially code should be delivered embedded in your deliverable (e.g. inside the jar file) and not with configuration files you might expect to be modified in production.

Configuration Resource

Where to put the configuration information and how to format it is a subject of continuous debate. Files are the obvious choice for most applications but this brings up several interesting questions. First is the format. Java properties? XML? Something else (probably not)? Fortunately this issue can usually be resolved by looking at the structure of the information. The more structured the more likely XML is the right choice.

Files bring up the interesting problem of where do you put the file. If you deploy it inside your WAR, then it becomes a challenge for operations to locate the file. It's buried somewhere under your application server directory. You can put it in some other distinguished location but if you do this, you need to allow operations to override that location at server start or you will constrain their ability to manage the number of instances of the application per server.

Delivering the configuration with the application has other issues though. Invariably, there will be values of configuration that must be set to reflect the production environment. Once operations has modified the file, how do you deliver your subsequent versions. You can't overwrite the version of the file on the production server, yet if you've added necessary configuration values, how do these get merged into the production file.

Another challenge with file based configuration is that it does not scale well. It works well enough for O(10^2) servers but beyond that it is unwieldy. Attempting to manage files on thousands of servers requires appropriate tools to have any chance at all. Even with the best tools, there is a tendency for files to be missed and very confusing production errors result.

So what is the alternative? Centralizing configuration into services such as LDAP or a configuration database (CDB) can alleviate the challenges associated with distributing files. This definitely scales better to a large number of servers. But there are some challenges with this approach as well. The primary challenge is providing developers with a usable environment for testing. Developers typically require a local version of their resources with the ability to change the content at any time. This is relatively straightforward with files but much more difficult with LDAP or CDB. Still, the scalability offered to production may well be worth this hassle.

Telemetry

Okay, logging, but I have actually borrowed this term from professional racing (or NASA, take your pick) for a good reason. From Wikipedia:

Telemetry is a technology that allows the remote measurement and reporting of information of interest to the system designer or operator.

Thinking of the information that a component can send to logs in terms of telemetry instead of just logs, gives a better perspective on what belongs in the stream for operators. Developers tend to only think of logs as tools for themselves to help in debugging when in fact they are a necessity to properly monitor the health of a running application.

The question of course is what kind of information should be in the telemetry? Considering a common web service, at a minimum, I'd expect to find the following information easily in the stream:

  • The request URI including parameters
  • Basic parametric information about the request
  • External resource interactions performed by the service. These should include status and timings.
  • The result status of the request.
  • The processing time of the request.

If this information is made available to operations in real time, there is several types of monitors that can be created. Alarms can be set on thresholds for error status ratios or dramatic drops in request rates. Operational graphs can be made for average response time with potential alerts for response time drifting out of SLA. Dependency graphs can be constructed that will help operations correlate resource failures to client impacts. And this is from the small amount of information proposed above. Additional telemetry can provide even more operational monitoring capabilities.

The Java logging facility can meet the needs of generating telemetry although you may want to separate out telemetry into its own logger name. For small scale deployments, this telemetry can simply be sent to log files. Scripts that regularly scrape the logs can be used to extract the relevant bits of information. For larger scale operations however, a central logging scheme is more relevant. Logging using the socket handler may be sufficient although for very large scale installations, it may be desirable to move to a less reliable but more scalable transport such as multicast.

Summary

I know that I've only scratched the surface metadata issues. The point of this posting wasn't to give you an exhaustive guide of configuration and telemetry, but rather to bring up some issues and initiate a dialog. As always, comments most welcome.

Technorati Tags: , , , , , , , , , , , , ,