Documentation – A Necessary Evil
Ask just about any programmer what task they enjoy the least and more than likely you’ll hear documentation. There are even some programming styles (see agile programming) that actually adhere to a minimal documentation approach, putting the emphasis on working software. While cutting back on the documentation might be a good approach for getting an application out the door, it probably won’t help someone trying to maintain or modify it down the road.
Good documentation is essential when it comes to things like reusable components or library modules. If you’ve ever tried to use someone else’s code without documentation you know how hard it can be if you don’t know what it is you’re working with. In many cases you could potentially spend more time trying to read and understand the source code than you would in writing it yourself.
Programmer documentation is an entirely different animal from user documentation. When a programmer evaluates different libraries to help solve a problem, the documentation can often make the difference in choosing one over another. Good programmer documentation not only describes each module and parameter; it also provides solid examples on how to use each function.
There are a number of open source tools available to help build the basic programmer-level documentation in an automated fashion. In general, these tools will analyze source code, looking for functions and methods in order to extract the names and calling parameters for a first cut at documentation. These types of tools are especially useful when you’re developing a large application and you have many functions that will potentially change through the development process.
Most of the automated documentation tools rely on some type coding standard to extract comments from the source code such as:
/** * ... text ... */
Or
/*! * ... text ... */
Other conventions include embedding keywords in comments to document specific things such as:
\structto document a C-struct.\unionto document a union.\enumto document an enumeration type.\fnto document a function.\varto document a variable or typedef or enum value.\defto document a #define.\typedefto document a type definition.\fileto document a file.\namespaceto document a namespace.\packageto document a Java package.\interfaceto document an IDL interface.
Some languages, such as Python, have a built-in construct for embedding documentation that looks something like this:
"""@package docstring Documentation for this module. More details. """
Development Methodologies
Traditional software development methodologies were typically over burdened with documentation. If you’ve ever been required to meet such standards as the Capability Maturity Model Integration (CMMI), you’ll know what extensive documentation requirements really mean. The truth is that most open source projects don’t adhere to any traditional development models and often fall short when it comes to documentation.
Software design using an object-oriented approach typically relies heavily on modeling to build visual diagrams as a basis for coding and documentation. The Unified Modeling Language (UML) presents a way to use graphical-based notations and symbols to design a complex software application. Automated tools for building both documentation and code from UML diagrams are available from multiple vendors.
Agile development has four basic tenets, and they are designed to emphasize:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
The definition of comprehensive documentation is open to interpretation but should not be a license to ignore. The Agile methodology has gained popularity especially in the highly dynamic world of web development but may not lend itself well to large-scale projects with a multitude of programmers.
Literate programming is a “philosophy of computer programming based on the premise that a computer program should be written similar to literature, with human readability as a primary goal.” This approach puts a high premium on both readable code and embedded documentation.
Other development methodologies including Test Driven Development (TDD) and the whole Extreme Programming movement tend to place more emphasis on process and coding over documentation and design. While these don’t necessarily exclude documentation, they do place more emphasis on other things.
Adopting tools that fit in with your design philosophy will help bridge the gap between minimalistic text and first-class documentation. In the following sections we’ll look at a few of the more popular open source documentation tools available today.
Doxygen
According to the Doxygen site:
“It can help you in three ways:
- It can generate an on-line documentation browser (in HTML) and/or an off-line reference manual (in Latex) from a set of documented source files. There is also support for generating output in RTF (MS-Word), PostScript, hyperlinked PDF, compressed HTML, and Unix man pages. The documentation is extracted directly from the sources, which makes it much easier to keep the documentation consistent with the source code.
- You can configure doxygen to extract the code structure from undocumented source files. This is very useful to quickly find your way in large source distributions. You can also visualize the relations between the various elements by means of include dependency graphs, inheritance diagrams, and collaboration diagrams, all of which are generated automatically.
- You can even 'abuse' doxygen for creating normal documentation (as I did for this manual)..."
Some of the more well-known projects using doxygen include Asterisk, AbiWord, D-BUS, KDevelop, MediaWiki, MySQL, Samba, and Subversion. Apple also uses doxygen with their Xcode tool. Trolltech provides a tool called doxygen2qthelp to convert doxygen created documentation into a form that works with the qt help system.
Javadoc
Javadoc has been around probably longer than any of the other open source tools and enjoys a good following in the java community. There’s a good article on the testearly blog discussing the benefits of Doxygen vs Javadocs.
The basic functionality of Javadoc uses what they call the standard doclet. These are small applications that perform a specific extraction task. Each doclet builds a chunk of documentation from the source code that will eventually be combined into a bigger final document. Javadoc assumes a number of standard notational conventions in order to extract comments and other program specifics.
PyDoc
Pydoc is somewhat unique in that it is a built-in part of the Python language. It will create documentation based on the actual code and present it either as pages of text on a console or HTML served to a Web browser or saved to a file. It is also available interactively to the programmer when any module containing Python docstrings is imported.
A good example of PyDoc in action is the documentation for the language and more specifically the standard Python library. Providing good documentation and examples for library functions is crucial to helping a programmer put them to good use.
Final Notes
Documentation is one area that often gets cited as lacking in open source projects. This doesn’t have to be the case especially with the availability of automated tools such as the ones mentioned in this article. It’s up to the programmers to make sure that the old adage of “the job isn’t finished until the paperwork (or documentation) is done” doesn’t go unfulfilled.


