27 April 2008

Logging the map

A good logfile can help to find errors during software development in many ways. It can show which operation happens at which time. It may show important results, data, or content of variables which determines the control flow. Consequently the most common parts of logfiles are the timestamp, followed by some information about the operation performed at this time. This may be indicated by the name of a method or function, the name of an object beeing active or similar things.

Such a logfile can bee seen as a recording of things happening or occuring in time. Therefore, I will call it "software camera". It is one-dimensional, and therefore strong sequential. A program with its many branches, with its possibilities and options, will be serialized to one linear sequence of processing steps, and the "software camera" records them. This is very useful for debugging an application working on streaming data. The data itself are ordered in a strong sequence (in time), and if some errors only occure at some parts of the stream, logging as "software camera" may be the only way to find that error.

Now the question is, can be something other of interest than the time stamp and which part of the source code is active ? To explain this, I have to go into some details.

In the year 2006 and 2007, when I worked in Basel in Switzerland, I sometimes meet the guys of the Software Composition Group at Bern. I loved this, because these guys are very smart, and discussing things with them is fun. For example, I talked with Adrian Kuhn about an idea from him to create a software map. This should be like the well-known map of streets and towns, but it was not clear, which metrics should correspond to the metric "euclidean distance" as used by such a conventional map. As one possible solution, we discussed to use the distance of a data packet to the point where its processing would be ready. This idea is a consequence of data oriented design: all what matters here are the data and whats happens to them. The position of a data package would be defined by "measuring" how many processing steps are already applied and how many steps are to go. Time is only used for "speed" measurements, not as coordinate.

This thoughts were my starting point when designing a logfile for an project at my company. It is also an application acting on streaming data, and I wanted to write out information which allows me to see which data packet are at which point in the processing landscape. In detail, this was
  • identification of the specific datum (sometimes with or without values)
  • the action where the datum sits in front of
  • or the action where it is in
  • or the action where it came out.
I have not used any time stamp at all.
Going this way, it helped me to build the architecture such that it is robust to any way how datas are comming in. They could be shuffled, delayed, reorderd in many ways. Tracking the data through the scenery of processing helped my the to identify "unused roads", to small designed "roads" and important "crossings". In the next part, I will enrich the information such that I could generate SVG to create a real map. We will see.

To read more about software visualization, I suggest to look at the website of Moose , a reengineering and modeling tool. In its context there are many tools with nice ideas. It's worth to look at  - and yes, this academic stuff  *is* useful in practice!

10 April 2008

Secure Software - Part I

From time to time, I'm asking myself what secure software really is. Because if I would know what it is, I would be able find methods and tools to design and build such software. How should it look out, which properties should it have, how can I detect it ?

The trivial answer comming in mind immediatly is: 

Secure software is one which never crashes.

That's pretty simple. Is'nt it ? But - what is a "crash", what means "never" ? "Crash" often associated with the well-known "Blue-Screen". Or with the Segmentation Fault. The program is gone away after that, it disappeared. That's the same if I click on "Exit" in a menu, which is no special thing. Well, it is not exactly the same, the further is not intended, where the latter is. So, is a "crash" the situation that a software stops doing what I want ? That description would match another observation of "crash":  The program doesn't respond, the famous endless-loop. At this case, the software still exist, and it does something as well, but either it does not the thing I want, or it does it far too much.  
 
First conclusion: a secure software is one that always do what I want, and so many times I want.

The bad thing is, that I not always now what I should want. In many situations the software must tell me about my possibilities, so that I can think about what I want. Some software is very smart: it belives that it knows what I want and does that in advance, to spare time and stupid questions (from the stupid user). Now - how can a software never crash, because it does exactly what I want all the time, if I don't know what I want or if I can't tell that poor little thing what I want ? May be I don't know what I should want to achive my goal to create something great, may be I don't know the full consequences if I want this or that ? 

Second conclusion: a secure software is one that always do what I want, and so many times I want, and it never let come up any doubt what I should want to achieve my goal.

That sounds great ! But to be honest - the crash or frozen software is not the real bad thing. The real bad, evil catastrophe is that the data are killed ! That's the real reason why I would like to get a sledge hammer if faced to such a situation.....That hurts. Blue Screen or Segmentation Fault often result in bad, corrupted or even lost data. And frozen software is very good to prevent me from saving my work done in many hours. From this point of view, secure software is something which never damage my data in that sense, that they lost their value, their integrety or that they can not be processed further. Either the software can do what I want for all my data, or it never touch them at all.

Final conclusion: a secure software is one that always do what I want, and so many times I want, and which never let come up any doubt what I should want, and which do what I want  for exact all my data completely or never touch them at all.

Isn't it a great result ? But it's not all. More to come later.