... or TDDD and DEP8


As a nice byproduct of the huge "rolling" discussion we had back in April/May, various people have brainstormed about applying Test-Driven Development (TDD) techniques to Debian development. Here is a brief summary of some opinions on the matter:

  • a seminal conversation on identi.ca
  • a couple of blog posts by Lars, discussing the general problem of how to use TDD in distribution development and — to a lesser extent — its instantiation to Debian
  • a separate post by Daniel stressing the specific TDD advantage of increasing the confidence of maintainers in making far reaching changes

... and hey, they've also coined the cool TDDD acronym, which I hereby took the freedom to re-target to Test-Driven Development in Debian. Having a cool acronym, we are already half-way to actually having the process up and running *g*.

more testing

I believe Debian needs more testing and I've been advocating for that since quite a while — e.g. at DebConf10, as one of the main goals we should pursue in the near future. Of course advocating alone is not enough in Debian to make things happen and that is probably why this goal has been (thus far) less successful that others we have put forward, such as welcoming non-packaging contributors as Debian Project members. There are important reasons for increasing testing in Debian.

quality assurance

Quality Assurance has always been, and still is, one of the distinctive traits of Debian. I often say that Debian has a widespread culture of technical excellence and that is visible in several places: the Debian Policy process, lintian, piuparts, periodic full archive rebuilds, the EDOS/Mancoosi QA tools, the cultural fact that maintainers tend to know a lot about the software they're packaging rather than only about packaging, the we release when it's ready mantra, etc.

But caring about quality is not a boolean, it's rather something that should be continuously cherished, refining quality requirements over time. By simply maintaining the status quo in its QA tools and processes, Debian won't remain for long a distribution who could claim to care about package quality. Others will catch up and are in fact already doing that.

In particular, we surely have room for improvements in our quality tools and processes for:

  • Build time package testing. Several packages run their build-time test suites during package build. This aspect is somewhat supported by the Debian policy; we also do a best effort attempt to run build-time test suites via debhelper (starting from version 7). At the same time we could probably campaign more to encourage maintainers to look for build-time test suites which do not follow common Makefile target naming or, fwiw, which don't rely on make at all.

    TodoList.add("investigate and possibly file bug report against policy to encourage maintainers to run build-time test suites")
    TodoList.add("investigate and possibly file bug report against lintian to spot the presence of [non-invoked] build-time test suites")

  • Runtime package testing (AKA "as-installed package testing"). Some kinds of test-suites are difficult to run at build-time (e.g. complex applications that don't offer easy way to re-configure on the fly their filesystem paths) and some others are plainly impossible to run without having the packages to be tested properly setup (e.g. running services that are accessible only through the network) or an isolated, throw-away testbed (e.g. bootloaders, kernels, etc.). TTBOMK this kind of testing is not currently in use neither in Debian nor, indirectly, in any of its derivatives. autopkgtest (AKA DEP8) is a step in this direction; I will get back to it later on in this post.

  • Integration testing. When applied in the context of distributions, integration testing is about testing package combinations, task (as in tasksel) combinations, basic system functionalities, automated installations, etc. This is the aspect that Lars has discussed the most (and in which he seems to be interested the most) so I don't feel like adding much.

    Still, it's interesting to note that on this front Debian has an important potential to exploit. The large base of its developers and the fact that the average Debian user is technical savvy mean that we can potentially collect a lot of test cases. It's "just" (with well-deserved double quotes) a matter of deciding a standard interface for test case submissions and developing a — most likely virtualized — test runner. At that point a continuous integration service that periodically run and publishes results can be setup independently from the rest of the Debian infrastructure and offered to interested eyes, such as Release Team's. Any taker?

  • Automated code analysis and in particular automated bug finding is something which is becoming more and more common and — finally, after decades of research — viable. Successful proprietary tools and businesses are aplenty (a nice and very fun read on the subject is the story of Coverity), as well as FOSS solutions. For instance, the Linux kernel is routinely using Coccinelle to automatically find bugs and even produce the corresponding patches.

    Add to the above the fact that Debian is one of the largest software collection in existence (and quite probably the largest collection of Free software in existence) and is upstream of the packaging of that software for the most part. That turns Debian into a very peculiar platform for large scale automated code analyses. Not that we can aim at fixing all the bugs we are going to find, or even hope to tame all the false positives we are going to find. But we can offer an important service to the whole ecosystem of Free software and use it to pinpoint important class of bugs from a distribution point of view, such as nasty security bugs (as long as they can be automatically found…).

    DACA is a very good step in the right direction, even though it is still in its infancy.

reducing inertia

Inertia is a recurring topic among Debian lovers (and haters). It is often argued how difficult it is to make changes in Debian, both small and large, due to several (alleged) hindrances such as the size of the archive, the number of ports, the number of maintainers that should agree before proceeding with a cross-package change, etc. It's undeniable that the more code/ports/diversity you have, the more difficult is to apply "global" changes. But at least for what concerns the archive size, I believe that for the most part it's just FUD: simply debunking the self-inflicted culture about how "dangerous" doing NMUs is might go — and has already gone, imho — a long way to fight inertia.

Adding per-package and integration tests will make us go another long way in reducing inertia when it comes to performing archive-wide changes. Indeed if a package you are not entirely familiar with has extensive test suites, and if they still pass after your changes, you can be more confident in your changes. The barrier to contribution, possibly via NMU, gets reduced as a result. And if your change will turn out to be bad but still not spot by the test suites, then you can NMU (or otherwise contribute) again to extend the test suite and make the life easier for future contributors to that package. It smells a lot like an useful virtuous cycle to me.

autopkgtest / DEP8 — how you can help

Of all the above, the topic that intrigues me the most is as-installed package testing. Work on that front has been started a few years ago by Ian Jackson when he was working for Canonical. The status quo is embodied by the autopkgtest package. At present, the package contains of various tools and the following two specs:

  1. README.package-tests provides a standard format to declare per-package tests using the new debian/tests/control file. Tests come as executable files which will be run — by the adt-run tool — in a testbed where the package(s) to be tested is already installed.

This part of the specs has been reified as DEP8 which I'm (supposedly) co-driving with Iustin Pop and Ian (for well-deserved credits).

  1. README.virtualisation-sever describes the interface among the test runner and the testbed. A nice separation is provided among the runner and the testbed, enabling different testbed environments with a varying degree of isolation: you can have a "null" testbed which runs tests on your real machines (needless to say, this is highly discouraged, but is provided by the adt-virt-null tool), a chroot testbed (adt-chroot), or a XEN/LVM based testbed (adt-virt-xenlvm).

The specs allow for several runtime testing scenarios and look quite flexible. The tools on the other hand suffer a bit of bitrot, which is unsurprisingly given they haven't been used much for several years. At the very minimum the following Python development tasks are in need of some love:

  • The usage of python-apt needs to be ported to recent API, as several used methods and attributes are now gone.
  • Porting from dchroot to schroot is needed for the adt-virt-chroot backend.
  • A kvm backend for the test runner would be nice.

If you are both interested in TDDD and grok Python, the above and many others tasks might whet your appetite. If this is the case don't hesitate to contact me, I'll be happy to provide some guidance.

Note: this post is the basis for the TDDD BoF that I will co-host with Tom Marble at DebConf11. If you plan to come, we will appreciate your thoughts on this matter as well as your help in getting the autopkgtest toolchain up and running again.


I wrote that mail just yesterday, where I explain more about my emerging tools for TDDD, in the FreedomBox context.

http://liw.fi/systest/ is the tool I have, modelled on (and based on) the Python unittest library.

Comment by Lars Wirzenius Thu 21 Jul 2011 06:37:07 PM CEST


I wrote that mail just yesterday, where I explain more about my emerging tools for TDDD, in the FreedomBox context.

http://liw.fi/systest/ is the tool I have, modelled on (and based on) the Python unittest library.

Thanks for the pointers to the mail and to the tools. The systest utility looks interesting and could be a basis for both integration and system testing. However, I don't feel particularly good about committing to a specific language as the interface we are going to propose to collect tests from developers and users. (I'm not sure you're proposing that or not, but I wanted to emphasize this point.)

systest could be a very nice utility to write tests, possibly the most common one, but I would prefer defining a strictly process-level interface to understand whether a test has passed or not. In fact, we could imagine reusing DEP8 interface (i.e. a test is successful if it exit 0 and if it doesn't print anything on stderr) as a basis for integration/system testing.

For what concerns the abstraction among the test runner and the actual virtualization layer, you might want to have a look at the second part of the autopkgtest specs which I've mentioned README.virtualisation-server. It provides an abstraction protocol among the test runner and the testbed backend. Maybe you could consider that as an abstraction?

Final comment: why neither systest nor vmbuilder are in Debian? :-)

Comment by zack Thu 21 Jul 2011 09:35:45 PM CEST

DEP8 isn’t written well at all, it’s very confusing. I wouldn’t mind adding an entry for e.g. mksh, but not with a spec like that, it needs a large amount of work first. Sorry.

Comment by mirabilos Thu 21 Jul 2011 10:34:57 PM CEST

I don't find autopkgtest to be a useful solution for the problems I'm interested in, so I don't intend to deal with it. I find it to be excessively complicated for my taste, and one of the things that I really don't want to have is to have every test be a separate script. I've written tests like that before, and it's quite unnecessary pain. Having a smaller number of modules is much easier to write, and easier to maintain. Having everything in the same language will make things even more easier to maintain.

vmdebootstrap and systest are not in Debian because I don't think it's useful to push stuff into Debian until it's ready for use by others than the developers. vmdebootstrap might be ready, systest has not been used for real, so it clearly isn't. However, I'm not going to spend time packaging vmdebootstrap at this time, though I'd be happy to see someone else do that.

Comment by Lars Wirzenius Thu 21 Jul 2011 10:36:30 PM CEST

please do an a-g source mksh and check (hah!) out the files check.t and check.pl, they contain the testsuite, and the format of check.t is about self-explanatory. Has potential?

It allows for requiring specific output on stdout or stderr (default is none), requiring exit code 0 (default), ≠ 0, or some specified one, allows categories (even by OS, which is interesting for the Hurd people), time limits, and a few other things.

Comment by mirabilos Thu 21 Jul 2011 10:38:53 PM CEST
However, if you are going to be using a process level interface, use TAP or Robert Collins's subunit, since they're well known, well supported standard interfaces for that kind of thing.
Comment by Lars Wirzenius Thu 21 Jul 2011 10:39:34 PM CEST

DEP8 isn’t written well at all, it’s very confusing. I wouldn’t mind adding an entry for e.g. mksh, but not with a spec like that, it needs a large amount of work first. Sorry.

DEP8 is not written at all ATM, hence the above is sort of expected. You might have guessed that by the fact that DEP8 is still in "DRAFT" state and quite heavily so.

To stress this even further: DEP8 is not to be adopted yet, it needs work. But before doing that, we need to put the toolchain back into shape, so that we can at least test the testing infrastructure (and sorry for the word trick).

Comment by zack Thu 21 Jul 2011 10:57:09 PM CEST
Please also have a look at other distros doing test driven development. For example meego.com is using OBS build system for distro development and tying in testing as well with systems like OTS. Surely there's something Debian can reuse even if OBS isn't for Debian, though for meego tools they use OBS to build Debian packages.
Comment by mikko.rapeli Thu 21 Jul 2011 11:24:52 PM CEST