Eclectic paper on the Ultimate Debian Database

A few months ago, I've co-authored with Lucas a paper on UDD, which has just been presented at this year IEEE's Mining Software Repository conference, continuing my recent tradition of eclectic papers.

The paper is titled The Ultimate Debian Database: Consolidating Bazaar Metadata for Quality Assurance and Data Mining and is available for download from my publications page.

For Debian people already familiar with UDD there is probably not much to learn from it, as the main target of the paper is the community of scientists doing data mining on software repositories. For them, UDD offers a valuable entry point to Debian "facts", as data sources reflected in the database are easily joinable together and to some extent already validated by other UDD users (e.g. QA people). Nevertheless the first two sections of the paper are probably of more broad interest. There we have given our point of view on the so called Debian Data Hell: why it exists, how it's related to the nature of Debian and similar distros, etc.

I've already noted in the past how that is also related to the culture of freedom that in Debian we value not only in our software, but also in our infrastructure and procedures. We should just get rid of a bit of inertia, and total world domination will then be just around the corner :-)

I'm happy to conclude quoting the acknowledgments section of the paper:

Acknowledgments

The authors would like to thank all UDD contributors, and in particular: Christian von Essen and Marc Brockschmidt (student and co-mentor in the Google Summer of Code which witnessed the first UDD implementation), Olivier Berger for his support and FLOSSmole contacts, Andreas Tille who contributed several gatherers, the Debian community at large, the "German cabal" and Debian System Administrators for their UDD hosting and support.