... or TDDD and DEP8

context

As a nice byproduct of the huge "rolling" discussion we had back in April/May, various people have brainstormed about applying Test-Driven Development (TDD) techniques to Debian development. Here is a brief summary of some opinions on the matter:

a seminal conversation on identi.ca
a couple of blog posts by Lars, discussing the general problem of how to use TDD in distribution development and — to a lesser extent — its instantiation to Debian
a separate post by Daniel stressing the specific TDD advantage of increasing the confidence of maintainers in making far reaching changes

... and hey, they've also coined the cool TDDD acronym, which I hereby took the freedom to re-target to Test-Driven Development in Debian. Having a cool acronym, we are already half-way to actually having the process up and running *g*.

more testing

I believe Debian needs more testing and I've been advocating for that since quite a while — e.g. at DebConf10, as one of the main goals we should pursue in the near future. Of course advocating alone is not enough in Debian to make things happen and that is probably why this goal has been (thus far) less successful that others we have put forward, such as welcoming non-packaging contributors as Debian Project members. There are important reasons for increasing testing in Debian.

quality assurance

Quality Assurance has always been, and still is, one of the distinctive traits of Debian. I often say that Debian has a widespread culture of technical excellence and that is visible in several places: the Debian Policy process, lintian, piuparts, periodic full archive rebuilds, the EDOS/Mancoosi QA tools, the cultural fact that maintainers tend to know a lot about the software they're packaging rather than only about packaging, the we release when it's ready mantra, etc.

But caring about quality is not a boolean, it's rather something that should be continuously cherished, refining quality requirements over time. By simply maintaining the status quo in its QA tools and processes, Debian won't remain for long a distribution who could claim to care about package quality. Others will catch up and are in fact already doing that.

In particular, we surely have room for improvements in our quality tools and processes for:

Build time package testing. Several packages run their build-time test suites during package build. This aspect is somewhat supported by the Debian policy; we also do a best effort attempt to run build-time test suites via debhelper (starting from version 7). At the same time we could probably campaign more to encourage maintainers to look for build-time test suites which do not follow common Makefile target naming or, fwiw, which don't rely on make at all.

TodoList.add("investigate and possibly file bug report against policy to encourage maintainers to run build-time test suites")
TodoList.add("investigate and possibly file bug report against lintian to spot the presence of [non-invoked] build-time test suites")
Runtime package testing (AKA "as-installed package testing"). Some kinds of test-suites are difficult to run at build-time (e.g. complex applications that don't offer easy way to re-configure on the fly their filesystem paths) and some others are plainly impossible to run without having the packages to be tested properly setup (e.g. running services that are accessible only through the network) or an isolated, throw-away testbed (e.g. bootloaders, kernels, etc.). TTBOMK this kind of testing is not currently in use neither in Debian nor, indirectly, in any of its derivatives. autopkgtest (AKA DEP8) is a step in this direction; I will get back to it later on in this post.
Integration testing. When applied in the context of distributions, integration testing is about testing package combinations, task (as in tasksel) combinations, basic system functionalities, automated installations, etc. This is the aspect that Lars has discussed the most (and in which he seems to be interested the most) so I don't feel like adding much.

Still, it's interesting to note that on this front Debian has an important potential to exploit. The large base of its developers and the fact that the average Debian user is technical savvy mean that we can potentially collect a lot of test cases. It's "just" (with well-deserved double quotes) a matter of deciding a standard interface for test case submissions and developing a — most likely virtualized — test runner. At that point a continuous integration service that periodically run and publishes results can be setup independently from the rest of the Debian infrastructure and offered to interested eyes, such as Release Team's. Any taker?
Automated code analysis and in particular automated bug finding is something which is becoming more and more common and — finally, after decades of research — viable. Successful proprietary tools and businesses are aplenty (a nice and very fun read on the subject is the story of Coverity), as well as FOSS solutions. For instance, the Linux kernel is routinely using Coccinelle to automatically find bugs and even produce the corresponding patches.

Add to the above the fact that Debian is one of the largest software collection in existence (and quite probably the largest collection of Free software in existence) and is upstream of the packaging of that software for the most part. That turns Debian into a very peculiar platform for large scale automated code analyses. Not that we can aim at fixing all the bugs we are going to find, or even hope to tame all the false positives we are going to find. But we can offer an important service to the whole ecosystem of Free software and use it to pinpoint important class of bugs from a distribution point of view, such as nasty security bugs (as long as they can be automatically found…).

DACA is a very good step in the right direction, even though it is still in its infancy.

reducing inertia

Inertia is a recurring topic among Debian lovers (and haters). It is often argued how difficult it is to make changes in Debian, both small and large, due to several (alleged) hindrances such as the size of the archive, the number of ports, the number of maintainers that should agree before proceeding with a cross-package change, etc. It's undeniable that the more code/ports/diversity you have, the more difficult is to apply "global" changes. But at least for what concerns the archive size, I believe that for the most part it's just FUD: simply debunking the self-inflicted culture about how "dangerous" doing NMUs is might go — and has already gone, imho — a long way to fight inertia.

Adding per-package and integration tests will make us go another long way in reducing inertia when it comes to performing archive-wide changes. Indeed if a package you are not entirely familiar with has extensive test suites, and if they still pass after your changes, you can be more confident in your changes. The barrier to contribution, possibly via NMU, gets reduced as a result. And if your change will turn out to be bad but still not spot by the test suites, then you can NMU (or otherwise contribute) again to extend the test suite and make the life easier for future contributors to that package. It smells a lot like an useful virtuous cycle to me.

autopkgtest / DEP8 — how you can help

Of all the above, the topic that intrigues me the most is as-installed package testing. Work on that front has been started a few years ago by Ian Jackson when he was working for Canonical. The status quo is embodied by the autopkgtest package. At present, the package contains of various tools and the following two specs:

README.package-tests provides a standard format to declare per-package tests using the new debian/tests/control file. Tests come as executable files which will be run — by the adt-run tool — in a testbed where the package(s) to be tested is already installed.

This part of the specs has been reified as DEP8 which I'm (supposedly) co-driving with Iustin Pop and Ian (for well-deserved credits).

README.virtualisation-sever describes the interface among the test runner and the testbed. A nice separation is provided among the runner and the testbed, enabling different testbed environments with a varying degree of isolation: you can have a "null" testbed which runs tests on your real machines (needless to say, this is highly discouraged, but is provided by the adt-virt-null tool), a chroot testbed (adt-chroot), or a XEN/LVM based testbed (adt-virt-xenlvm).

The specs allow for several runtime testing scenarios and look quite flexible. The tools on the other hand suffer a bit of bitrot, which is unsurprisingly given they haven't been used much for several years. At the very minimum the following Python development tasks are in need of some love:

The usage of python-apt needs to be ported to recent API, as several used methods and attributes are now gone.
Porting from dchroot to schroot is needed for the adt-virt-chroot backend.
A kvm backend for the test runner would be nice.

If you are both interested in TDDD and grok Python, the above and many others tasks might whet your appetite. If this is the case don't hesitate to contact me, I'll be happy to provide some guidance.

Note: this post is the basis for the TDDD BoF that I will co-host with Tom Marble at DebConf11. If you plan to come, we will appreciate your thoughts on this matter as well as your help in getting the autopkgtest toolchain up and running again.