PragDave on Software Archeology and The Advantages of Self-Containment


I just finished listening to Episode 148: Software Archaeology with Dave Thomas (Pragmatic Dave) of Software Engineering Radio.

It’s a really interesting interview and Dave has a lot to say about our Software Engineering Jobs of today:

  • Most of the time we read code rather than write it
  • Most of the code we have to read is in less than perfect shape
  • Most of the programmers aren’t educated in how to read code
  • Far too often people change code they don’t really understand
  • Tests are a great way of proving your hypotheses about what existing code does
  • Code rots, and you have to be prepared or it
  • Only documentation tends to rot faster than code
  • Most documentation that gets written is mostly useless, because people tend to document the obvious rather than the reasoning behind their decisions
  • Reading code is not only good in software maintenance but also is an educating exercise
  • You can write bad code in any language, but also: a well-written program can be easy to understand in any language

There’s not much space for disagreement here, since you learn these lessons pretty fast when you maintain software. Also the idea of having to “get into people’s heads” about their style of programming, thinking and problem solving is absolutely right. To me, it’s often good to know who wrote a method, especially in projects where I know the initial programmers. You sometimes really know why something was done in a certain way if you know who wrote the code.

So it may sound funny, but sometimes you find your way to a bug a lot faster by knowing whose code it’s in. Every programmer leaves some DNA in their code, and since software maintenance has a lot in common with detective stories, it’s always good to have a profile of your “Mister X” in mind😉

Some of the things Dave talks about sound weird or ridiculously obvious at first:

  • Print out an artefact in 2pt font size to learn about its structure
  • You need to make sure you’re digging in the right version of the code
  • You need to make sure you have all the necessary code
  • Use a local code repository
  • IDE’s and code generation can be dangerous
  • Tools like Emacs and grep can be of great help

But once you think about them, you’ll find them much more useful advice than they sound in the first place.

When trying to translate some of these to Smalltalk, however, I guess we can put some of these aside. There’s no point in using grep on Smalltalk code if you work in an image based environment. There are much better tools like References, Senders/Implementers and such. And there’s not much use in looking at a body of code isolated from the Smalltalk class library. In the light of these arguments, the use of a good source code control system like Envy becomes more important. Since I never tried the tiny font-size thing, I can not really judge it. But I can imagine that in a purely file based world, even this kind of thing can help (and if not, it’s been worth a try).

This, combined with the fact that software archeology is not necessarily a nondestructive task (it’s not about conserving code, but about understanding it), version control gets more important. As I mentioned in my talk at the VA Smalltalk Forum two weeks ago, Refactoring is a great way of finding your way into existing code. Sometimes just changing the name of a temporary variable or a parameter or extracting parts of e method and giving it an intention revealing name makes understanding code a whole lot easier.

Dave also mentions the importance of getting a hold on all artefacts that are needed to make a piece of code run, like database creation scripts and stuff. Since I am a strong believer in “get all your stuff into your code repository” and since my favourite language Smalltalk makes this quite easy by having the compiler available to create classes out of scripts (again, see my talk for an example), I think we have a very good environement to our disposal to make the lives of later software archeologists easier.

Since Software Maintenance is one of the most important and least appreciated tasks in Software Engineering, I guess we owe it to ourselves and to the developers who need to get along with our code to use techniques and tools that make archeology easier. We need to improve our coding style, documentation habits and we need to keep in mind our code is not only there to be executed by machines, but mostly to be understood and reworked by humans.

And I guess there aren’t many environments that make reading (into) code as easy as Smalltalk. The Smalltalk Debugger as a means of watching the innards of a program on the job and even modifying a program in the middle of a task, together with a rich set of inspection tools and well-integrated testing facilities like sUnit, combined with a whole set of Code Browsers that are written around the idea that “FINDING EXISTING CODE” is the main task in programming, not only make writing code easier in the first place, but also help later developer generations to understand our code.

In case of Smalltalk, this is not a theoretical assumption, it is a proven fact that the language allows for long product lifecycles. There are many systems around that are running for more than one and a half decade, and continue to be integrated into modern architectures like Web Services or to be transformed into Web based applications while also adding new features all the time. This would presumable be a lot harder in less maintenance-friendly environments.

As developers, it’s our responsibility to lay the grounds for later generations, and we have to make decisions that will have consequences even when they’re pushing us through the park of our elderly’s home on a wheelchair every wednesday afternoon. So using an environment that’s helpful in reading code and checking hypotheses is what we should do. Smalltalk is an environment that qualifies in this area and should seriously considered as the platform of choice.

Many people out there think that Image-based systems are a bad thing, since they feel they’re losing control over their system’s structure, but it’s quite the opposite: A Smalltalk Image is the whole world of the objects in a system and therefor gives you much more abilities to explore and search than a list of dead files. In the end, in an object world, what kind of structure does a file give you? Your running program consists of classes and instances, not of files.

Taking Dave’s Analogy a step further, there is a strong advantage in Smalltalk, because learning about existing code is not really archeology, but much more like time travel: you just jump into the bazaar of a running system, talk to the merchants and visit the temple on the hill instead of just digging some old stones out of the sands. So while software archeologists try to imagine how a system might work or have worked, we Smalltalk developers have the luxury of just starting the system up and live in there for a day or two.