moar, and moar, and moar debsources stats

A while ago I've announced the availability of several stats about Debian source code on http://sources.debian.net. Since then the statistical basis of those stats has increased a lot, and now includes all Debian historical releases, from hamm (July 1998) onward. This allows to appreciate macro-level evolution trends in Free Software, over a period of more than 15 years, through the eyes of a distro that sits at the nice intersection of the eldest, largest, and most reputed distros.

To get there I've added support for sticky suites to the plumbing layer of debsources, and then injected historical releases from http://archive.debian.org. The injection process took about a week (without any sort of parallelism, pretty slow disks, and computing sha256 checksums, ctags, and sloccount on all source files) and has been an "interesting" experience.

When you go back decades in technology time, bit rot is just around the corner, and I've found my share while injecting archive.d.o into sources.d.n. In both cases the respective maintainers (Guillem and Ganneff, kudos) have been positive about and helpful in improving the situation, despite the low impact of the bugs I've found on the average user. That's quite important for the long-term preservation of digital information in general, and for the perennity of access to Free Software in the specific case of Debian.

While we are it, I'm now maintaining a list of bugs affecting sources.d.n but belonging to other packages, in case you fancy helping out but are not a Python hacker. Interestingly enough, quite a bit of those bugs are related to the fact that tools debsources uses (e.g. ctags, sloccount) are also starting to show their age.

You might wander why buzz, rex, and bo are still missing from sources.d.n. That's in fact for similar reasons. Before hamm Debian didn't have complete archive coverage in terms of Sources indexes and .dsc files. Given that debsources rely on both to extract source packages, it first needs to grow an additional abstraction layer that can cope with their absence. It's SMOP, and planned.

And now let's have fun with ctags bombs.

Yours truly,
Stefano “Indiana” Zacchiroli
(credits: KiBi, #debian-ftp)