blog/archives/2010/05

document your team - redux

2010-05-31T07:57:51Z

wiki.d.o/Teams to the rescue

About 3 years ago (too bad I've actually missed the 3-year anniversary by a few days!), Raphael set up http://wiki.debian.org/Teams.

In my recent encounters and contacts with people interested in contributing to Debian, I've found that page to be of invaluable help. In particular, people find it very useful in understanding the macro-structures of Debian and in understanding where they can start to contribute. Approaching a team is most likely "less scary" than contacting a larger forum, and that page offers a good service of team indexing.

Of course, the usefulness of wiki.d.o/Teams is directly proportional to how much complete it is and to how much individual team pages are current. So, in case you didn't know about the page or that you just remembered that years ago you set up a page there and then forgot about it, this is probably a good moment to add or update your team information there.

To eat my own dog food, I've recently set up Teams/DPL, which is probably not terribly useful, but it contains interesting stuff such as where the daily activity bits are stored. (Yes: I'm just a one-person-team, but given the index is useful to document Debian "parts" in general, I believe we should allow for a slightly semantic abuse of the name.)

Incidentally, that shows another way in which wiki team pages can come to the rescue: improve the documentation of our processes. For instance, we are grown accustomed to the fact that d-d-a is an authoritative information source. Well, of course it is, but when you need to find out a 4-year old d-d-a post of some core team, since it is the only documentation of some still current (is it?) procedure, let's be fair and acknowledge that a mailing list archive it's not that handy. So, summarizing relevant d-d-a posts in team pages (with references to the originals) is another way, accessible to everybody, to help in keeping docs current. I've been doing that recently for a handful of teams, and people seem to appreciate (of course it's your responsibility to check that they do appreciate).

Debian-based scientific computing at EDF

2010-05-27T16:13:25Z

why Debian for scientific computing: a case study

Yesterday I've been invited to visit EDF R&D center at Clamart, near Paris. They wanted to discuss their Debian usage and present some of the cool stuff they're doing. The most interesting component is an in-house Debian-based distribution called "calibre", which has been presented at RMLL 2008.

Even though it is now growing desktop profiles (currently deployed on about 1'200 desktops and counting), calibre was mainly developed for clusters dedicated to scientific computing. Current cluster deployments at EDF are not that big, but still comprise hundreds of machines for about 40 teraFLOPS, with their largest cluster in Top 500. The main goal of calibre was to quickly bring a complete cluster from the bare metal to production state. The goal has been quite successfully achieved: using Debian and FAI they get a cluster of 200 machines ready for production in about 1 hour and a half, installing more than 3'000 packages on each machine (as the cluster will be used for heterogeneous purposes, rather than for a handful of specific applications).

What I found most interesting of the visit are the reasons for choosing Debian over other (commercial) distros for their scientific computing purposes:

They use a wide range of open source scientific softwares (some developed in house): according to their claims Debian is the mainstream distribution with the largest offering of such software, with the additional benefit that corresponding Debian maintainers are experts of the software they package, so that they can trust them. They have kudos for Debian Science, which I'm happy to proxy.
They need to rebuild packages to trigger specific optimizations for their clusters. On one hand, that defeats the typical management argument of "commercial support" that other distros offer, as rebuilding packages void support guarantees.
On the other hand, it really helps them the focus on quality that we do have on Debian: we fight FTBFSs to death, and people which need to rebuild our packages really appreciate that.

EDF is generally keen of contributing back to Debian (even though the team behind calibre is still small), and I've been happy to walk them through how they can contribute.

The last interesting feedback I've to share, is that they feel a bit alone in what they're doing (which is unsurprisingly, given that their communication on the matter has been rather limited thus far ...). Still, there is probably room for synergies that can be better exploited among users with similar needs. So, are you a cluster / scientific computing user of Debian? Then let me know, and I'll be happy to get you in touch with EDF and other users with similar interests.

debian maintainers list

2010-05-19T07:09:29Z

on DMs and the packages they maintain

The notion of Debian Maintainer (DM) has been with Debian for about 3 years. As of now, Debian has approximately 120 DMs, which maintain a significant part of the packages in the archive.

I'm hereby happy to evilly spoil that Enrico Zini has just set up an index of Debian Maintainers and of the packages they maintain. The index is linked from the people section of our website as a symbolic step in giving more credit to DMs.

Now, if anyone would like to setup a nice graph for the statistics page, which monitors the evolution of DMs (and DDs?), that would be useful.
Any idea on how to do that retroactively too?

UDD - consolidating bazaar metadata for QA and data mining

2010-05-10T20:40:39Z

Eclectic paper on the Ultimate Debian Database

A few months ago, I've co-authored with Lucas a paper on UDD, which has just been presented at this year IEEE's Mining Software Repository conference, continuing my recent tradition of eclectic papers.

The paper is titled The Ultimate Debian Database: Consolidating Bazaar Metadata for Quality Assurance and Data Mining and is available for download from my publications page.

For Debian people already familiar with UDD there is probably not much to learn from it, as the main target of the paper is the community of scientists doing data mining on software repositories. For them, UDD offers a valuable entry point to Debian "facts", as data sources reflected in the database are easily joinable together and to some extent already validated by other UDD users (e.g. QA people). Nevertheless the first two sections of the paper are probably of more broad interest. There we have given our point of view on the so called Debian Data Hell: why it exists, how it's related to the nature of Debian and similar distros, etc.

I've already noted in the past how that is also related to the culture of freedom that in Debian we value not only in our software, but also in our infrastructure and procedures. We should just get rid of a bit of inertia, and total world domination will then be just around the corner

I'm happy to conclude quoting the acknowledgments section of the paper:

Acknowledgments

The authors would like to thank all UDD contributors, and in particular: Christian von Essen and Marc Brockschmidt (student and co-mentor in the Google Summer of Code which witnessed the first UDD implementation), Olivier Berger for his support and FLOSSmole contacts, Andreas Tille who contributed several gatherers, the Debian community at large, the "German cabal" and Debian System Administrators for their UDD hosting and support.

tickler file for maildir

2013-02-15T18:25:42Z

snooze your INBOX

A few days ago Chris made me realize that my GTD setup was still missing a piece: a proper tickler file implementation.

As I wanted an IMAP-free implementation, I've rolled up my own: rotate-tickler. Nothing too fancy: shell script triggered by cron, which moves mails around a set of DELAYED.{1,..} maildir-s. Still, it is careful about clashes (which shouldn't happen according to the maildir specs, but better be paranoid with mails), and properly cleans up the "seen" flag for the final (re)delivery in INBOX.

Give it a try and feel free to git format-patch-me your improvements. (BTW, does it make any sense today to publish any piece of software, no matter how small, without shelving it into some DVCS?)