Monday, July 14, 2008

Some Guiding Principles for Supporting Production Apps

I just finished a two week rotation supporting our production web services and wanted to jot down some thoughts. This coincided with being sent a link to a Scott Hanselman blog post about guiding principles of software development which really enjoyed reading. I thought I'd extend a few of those now I've come out of my support rotation an enlightened developer (I hope).

Although Scott's blog is for Windows/.NET development, there's lots of goodies in there which can be applied to any language or ported to any language. I'd like to narrow in on the support based principles: tracing (logging) and error handling.

Scott mentions that for critical/error/warn logging, your audience is not a developer. Well in this case, I am, however, I may not know the application, so I need as much information as I can get. This leads me onto my first extension; transaction IDs.

Now, there's two kinds of transaction IDs really, especially when dealing with a service based application. One is every invocation of your service should be given a unique ID and that should be passed around the code (and logged against) until the path returns to the client. Depending on your application, this could get really noisy, but a multi-threaded application could benefit from such information. Second is a resource/task based ID, if a service does something (let's call it A), but the execution path returns to the client before A is complete, the ID for A should be returned to the client and this ID should live through the execution of A and stored or achieved accordingly. In terms of logging, when ever you come across this ID, use it. This generally follows for all IDs, if you have an ID, use it when logging.

With error handling, it's accepted that catching a generic exception is bad, although permissible if at the boundary of your application. If you must catch 'Exception' or 'RuntimeException', log it and use at least a error condition. It's a generic exception for a reason and that reason is you don't know what happened.

AOP is a great technique for logging, don't be frightened (and this goes with all logging) to use it everywhere.

Moving onto a topic glanced upon, but not thoroughly explored; configuration. If building a server product, config is your friend. When things go bad and you need to re-wire your application, not having to do a code change is a massive benefit. If the config is dynamically read, that's even better as it won't require a restart to the server.

Lastly, think about your archive mechanism and think about what data you'll need to preserve and for how long. Ideally, from that data, execution paths should be interpretable, at least at a high level.

Oh, and please be kind to your support team, make it easy for them to gather information about your application. If an application is one of many, consider implementing a common health/management (and provisioning if you can) interface.

I think putting together a list of principles like this is a great idea, each engineering group should think about the ways they work and how to drive commonality across products. Teams shouldn't be scared of thinking about support throughout the development process either, although they've been hearing the same thing about security for years ;)

3 comments:

Anonymous said...

Everything you say is common sense, yet not so common. And this is exactly why every dev should spend time on support - you tend to learn some good practices.

As an interesting sidenote about the transaction id in the logs, how do you log transaction id's for real-time applications which have events that come in from the network, not from the user/application. If you have an external dependency which supports logging but knows nothing of your id's, how do you co-relate those logs to the transactions in your logs? What is the approach you would like to see taken?

Robbie said...

Well, I guess if you have something that sits in between your application and the network, say a webserver, some kind of border controller or security gateway, they should be able to inject a unique ID. The first place I'd think about injecting that ID would be the header, which should work for any given HTTP message.

Correlating those IDs may be tricky, but there's usually a longer lasting ID that can tie those instances together. In fact it plays into a software pattern where the name escapes me.

If the transaction, and other IDs are stored in a base case and that is passed around instead of the IDs themselves, it lends to cleaner reading and refactoring of code. Stupid memory.

Anonymous said...

As someone or other said: "Always write your code as if the person who will maintain it is a serial killer who knows where you live".