27 April 2008

Logging the map

A good logfile can help to find errors during software development in many ways. It can show which operation happens at which time. It may show important results, data, or content of variables which determines the control flow. Consequently the most common parts of logfiles are the timestamp, followed by some information about the operation performed at this time. This may be indicated by the name of a method or function, the name of an object beeing active or similar things.

Such a logfile can bee seen as a recording of things happening or occuring in time. Therefore, I will call it "software camera". It is one-dimensional, and therefore strong sequential. A program with its many branches, with its possibilities and options, will be serialized to one linear sequence of processing steps, and the "software camera" records them. This is very useful for debugging an application working on streaming data. The data itself are ordered in a strong sequence (in time), and if some errors only occure at some parts of the stream, logging as "software camera" may be the only way to find that error.

Now the question is, can be something other of interest than the time stamp and which part of the source code is active ? To explain this, I have to go into some details.

In the year 2006 and 2007, when I worked in Basel in Switzerland, I sometimes meet the guys of the Software Composition Group at Bern. I loved this, because these guys are very smart, and discussing things with them is fun. For example, I talked with Adrian Kuhn about an idea from him to create a software map. This should be like the well-known map of streets and towns, but it was not clear, which metrics should correspond to the metric "euclidean distance" as used by such a conventional map. As one possible solution, we discussed to use the distance of a data packet to the point where its processing would be ready. This idea is a consequence of data oriented design: all what matters here are the data and whats happens to them. The position of a data package would be defined by "measuring" how many processing steps are already applied and how many steps are to go. Time is only used for "speed" measurements, not as coordinate.

This thoughts were my starting point when designing a logfile for an project at my company. It is also an application acting on streaming data, and I wanted to write out information which allows me to see which data packet are at which point in the processing landscape. In detail, this was
  • identification of the specific datum (sometimes with or without values)
  • the action where the datum sits in front of
  • or the action where it is in
  • or the action where it came out.
I have not used any time stamp at all.
Going this way, it helped me to build the architecture such that it is robust to any way how datas are comming in. They could be shuffled, delayed, reorderd in many ways. Tracking the data through the scenery of processing helped my the to identify "unused roads", to small designed "roads" and important "crossings". In the next part, I will enrich the information such that I could generate SVG to create a real map. We will see.

To read more about software visualization, I suggest to look at the website of Moose , a reengineering and modeling tool. In its context there are many tools with nice ideas. It's worth to look at  - and yes, this academic stuff  *is* useful in practice!