Documenting your code: which documentation is essential?

Documenting your code is something nobody really enjoys, apart from the occasional masochist. The subject always comes up if we audit a piece of computational software. Everyone understands that proper documentation is essential for the extremely complex code that we specialize in. But nobody enjoys making it and in the end the documentation is almost always outdated, incorrect and insufficient. Fortunately, the right approach can make things a lot better.

Documenting your code done wrong

First, let’s look at some typical examples of documentation done wrong.

The Documentation Fundamentalist View
You recognize this situation if people come to you with a trolley of large binders, each holding a thousand (or more) pages of documentation. The approach here is that every detail of the code on each level of abstraction needs to be documented. The sheer volume of the documentation makes it mostly useless and keeping it updated takes a huge effort. And, frankly, is a waste of money unless if you’re working on the software for a spaceship which requires a rigorous approach.
The Agile Type
Nothing wrong with agile working when it’s done right. But in some cases, it’s a bit of an excuse for not properly documenting your stuff. The first sprint delivers a general design document that is never updated again because new insights have made it obsolete. At the end of the project, there are loads of inconsistent documents that are hardly helpful.
The Documentation Hater
Probably most developers would recognize themselves in this term, but here I mean the type that doesn’t shy away from declaring that documentation is just a waste of time and writing clean code is all there is to it. Again, as with agile, I acknowledge the value of clean code. But any code of reasonable size requires more than that. Anyone working on the code will need a general idea about the architecture and principles. And probably a bit more to avoid tedious searching in the code to understand what’s going on.

What documentation do you need?

The examples given above may seem a bit exaggerated but in all honesty the truth is often not far from them. So, what would be the ‘right’ kind of documentation? In my experience, I would want the following:

Models and Algorithms
The term Models may be a bit confusing. I’m not referring to models of the software (as in model-based software development) but to the mathematical modelling of the subject of the computation. This type of documentation is specific for computational software. It’s typically full of equations and derivations and references to external, often scientific literature. This type of documentation is often more static than the code itself. The basic modelling and most important algorithms do not change that much.
User’s Guide
Hopefully, someone is going to use your application. If that is not you, then proper user documentation is essential. Often, it’s best to have this written by someone who was not involved in the development as such. Otherwise, you may skip over things that are obvious for you but not for the user that is new to the application.
Another aspect of user’s documentation, which is especially relevant for complex computational code is the modelling and algorithmic background. That has some overlap with the first document mentioned above (Models and Algorithms) but you don’t want to bother the user with all the intricate details and mathematical stuff. She just needs to understand what she can and cannot do with the application. Writing this part requires someone who understands both the user and the developer. This is a rare kind of bird.
Architecture
The third essential document is the architecture document. My use of the term architecture here will likely antagonize people with formalist opinions about architecture but I hope they can forgive me. What I mean by architecture is anything that gives a comprehensive overview of the structure of the code. I tend to call it the map of the code because, like a real-live map it helps to find your way around. I think many people would call this the model of the code, but I use the term architecture for the distinction with models as under the first point above.
It’s best if this documentation is produced in some formal framework because it forces the documenter to be strict and complete and because the meaning of all the symbols is well understood. In some cases, the code skeleton can be generated from this documentation, but in my experience that is less useful for the type of compute-intensive code that we specialize in. In any case, even a simple PowerPoint presentation with colored boxes to indicate the components of the code can be just as helpful.
In some cases, interaction diagrams or state diagrams would also be useful on this level. This holds in particular for applications that consist of loosely coupled components like in microservices architectures.
Again, this type of documentation doesn’t change too often. It only changes in a major overhaul of the structure of the code which is very rare in most cases.
Developer Guidelines
Valuable applications tend to have a very long life and become legacy code. We often come across applications that were originally developed back in the 80’s of the previous century, some 30 or more years ago. Such applications have been developed by more than three generations of developers. To maintain code integrity, clear developer guidelines are essential. This would include coding guidelines (how to name the variables and classes, how to deal with things like errors and environment variables, etc.), tools to be used, which tests to make and how to run them. Most companies have standard documents for this, but the developer guidelines are an integral part of the documentation set of an application. If the development is moved to another company or department, the guidelines should go along.
Code Level Documentation
The rest of the documentation is not written explicitly but is generated from comments in the code. Obviously, these comments have to be well structured to allow proper documentation to be generated. At the same time, the comments should be helpful in the context of the code itself. If done right, this documentation fills the gap between the architecture document and the code lines.
In the code level documentation, you typically find the syntax and semantics of the classes, structures, and methods in your code. Often, it’s sufficient to have this documentation only in the code itself. If you are developing a library, it makes sense to convert these code comments into an HTML-based clickable overview of the classes or main routines.

Some useful tools for documenting your code

Automating as much as possible in the software development process is always a good idea. That certainly holds for documenting your code and maintaining it. In particular the more low-level parts of the documentation are very amenable to automation.

The best known in this respect is probably Doxygen. It’s a tool that generates nice documentation from the code itself and from comments in the code. You first have to familiarize yourself with how Doxygen uses the comments and what tricks you can do to have it generate readable documentation. But once you know that and apply it consistently in your code, you will not only have nice documentation but also properly annotated code. Doxygen originally targeted C-type languages, but these days it also supports all the most common languages like Python and even Fortran. I’m not quite sure how well the Fortran version works, but Fordocu would be a good alternative.

These days, Sphinx is gaining traction across the board. It is the goto documentation generator for Python. In theory, it can also handle other languages, but that is quite a tedious endeavor and doesn’t help to keep your code clean and pristine. In another blog, we show how Sphinx can be used in the software development process with CMake.

For sequence diagrams, Typora is a great tool. You specify the stuff in markdown and then you can generate nice-looking pictures in various formats. Because of the markdown input, it’s relatively easy to maintain.

Concluding

Documentation will probably remain a contentious subject. But with the right approach and tooling, the effort will be acceptable to most. And, if you ever had to work on a major code from someone else, you will understand that no code should be without proper documentation.

Scientific Software Engineering