blog/archives/2010/05zack's home pagehttp://upsilon.cc/~zack/blog/archives/2010/05/zack's home pageikiwiki2013-02-15T18:25:42Zdocument your team - reduxhttp://upsilon.cc/~zack/blog/posts/2010/05/document_your_team_-_redux/2010-05-31T07:57:51Z2010-05-31T07:55:12Z
<h1><a href="http://wiki.debian.org/Teams">wiki.d.o/Teams</a> to
the rescue</h1>
<p><a href=
"http://lists.debian.org/debian-devel-announce/2007/05/msg00008.html">
About 3 years ago</a> <small>(too bad I've actually missed the
3-year anniversary by a few days!)</small>, <a href=
"http://www.ouaza.com/wp">Raphael</a> set up <a href=
"http://wiki.debian.org/Teams">http://wiki.debian.org/Teams</a>.</p>
<p>In my recent encounters and contacts with people interested in
contributing to Debian, I've found that page to be of invaluable
help. In particular, people find it very useful in
<strong>understanding the macro-structures of Debian</strong> and
in understanding <strong>where they can start</strong> to
contribute. Approaching a team is most likely "less scary" than
contacting a larger forum, and that page offers a good service of
team indexing.</p>
<p>Of course, the usefulness of <code>wiki.d.o/Teams</code> is
directly proportional to how much complete it is and to how much
individual team pages are current. So, in case you didn't know
about the page or that you just remembered that years ago you set
up a page there and then forgot about it, this is probably a good
moment to <strong>add or update your team information</strong>
there.</p>
<p>To eat my own dog food, I've recently set up <a href=
"http://wiki.debian.org/Teams/DPL">Teams/DPL</a>, which is probably
not terribly useful, but it contains interesting stuff such as
where the daily activity bits are stored. <small>(Yes: I'm just a
one-person-team, but given the index is useful to document Debian
"parts" in general, I believe we should allow for a slightly
semantic abuse of the name.)</small></p>
<p>Incidentally, that shows another way in which wiki team pages
can come to the rescue: improve the <strong>documentation of our
processes</strong>. For instance, we are grown accustomed to the
fact that <a href=
"http://lists.debian.org/debian-devel-announce"><code>d-d-a</code></a>
is an authoritative information source. Well, of course it is, but
when you need to find out a 4-year old <code>d-d-a</code> post of
some core team, since it is the only documentation of some still
current (is it?) procedure, let's be fair and acknowledge that a
mailing list archive it's not <em>that handy</em>. So,
<strong>summarizing relevant <code>d-d-a</code> posts in team
pages</strong> (with references to the originals) is another way,
accessible to everybody, to help in keeping docs current. I've been
doing that recently for a handful of teams, and people seem to
appreciate <small>(of course it's your responsibility to check that
they <em>do</em> appreciate)</small>.</p>
Debian-based scientific computing at EDFhttp://upsilon.cc/~zack/blog/posts/2010/05/Debian-based_scientific_computing_at_EDF/2010-05-27T16:13:25Z2010-05-27T08:51:16Z
<h1>why Debian for scientific computing: a case study</h1>
<p>Yesterday I've been invited to visit <strong><a href=
"http://en.wikipedia.org/wiki/%C3%89lectricit%C3%A9_de_France">EDF</a>
<a href=
"http://research.edf.com/research-and-innovation-44204.html">R&D</a>
center</strong> at Clamart, near Paris. They wanted to discuss
their Debian usage and present some of the cool stuff they're
doing. The most interesting component is an in-house Debian-based
distribution called "<strong>calibre</strong>", which has been
<a href=
"http://2008.rmll.info/IMG/pdf/EDF-Informatique-Scientifique.pdf">presented
at RMLL 2008</a>.</p>
<p>Even though it is now growing desktop profiles (currently
deployed on about 1'200 desktops and counting), calibre was mainly
developed for clusters dedicated to <strong>scientific
computing</strong>. Current cluster deployments at EDF are not
<em>that</em> big, but still comprise hundreds of machines for
about 40 teraFLOPS, with their largest cluster in <a href=
"http://www.top500.org/">Top 500</a>. The main goal of calibre was
to quickly bring a complete cluster from the bare metal to
production state. The goal has been quite successfully achieved:
using Debian and <a href=
"http://www.informatik.uni-koeln.de/fai/">FAI</a> they get a
cluster of 200 machines ready for production in about 1 hour and a
half, installing more than 3'000 packages on each machine (as the
cluster will be used for heterogeneous purposes, rather than for a
handful of specific applications).</p>
<p>What I found most interesting of the visit are the
<strong>reasons for choosing Debian</strong> over other
(commercial) distros for their scientific computing purposes:</p>
<ul>
<li>
<p>They use a wide range of open source <strong>scientific
softwares</strong> <small>(some developed <a href=
"http://research.edf.com/research-and-the-scientific-community/softwares/softwares-44329.html">
in house</a>)</small>: according to their claims Debian is the
mainstream distribution with the <strong>largest offering</strong>
of such software, with the additional benefit that corresponding
Debian maintainers are experts of the software they package, so
that they can trust them. They have kudos for <a href=
"http://wiki.debian.org/Teams/DebianScience">Debian Science</a>,
which I'm happy to proxy.</p>
</li>
<li>
<p>They need to <strong>rebuild packages</strong> to trigger
specific optimizations for their clusters. On one hand, that
defeats the typical management argument of "commercial support"
that other distros offer, as rebuilding packages void support
guarantees.</p>
</li>
<li>
<p>On the other hand, it really helps them the focus on
<strong>quality</strong> that we do have on Debian: we fight
<strong><a href=
"https://en.wikipedia.org/wiki/FTBFS">FTBFS</a>s</strong> to death,
and people which need to rebuild our packages really appreciate
that.</p>
</li>
</ul>
<p>EDF is generally keen of contributing back to Debian (even
though the team behind calibre is still small), and I've been happy
to walk them through how they can contribute.</p>
<p>The last interesting feedback I've to share, is that they feel a
bit alone in what they're doing <small>(which is unsurprisingly,
given that their communication on the matter has been rather
limited thus far ...)</small>. Still, there is probably room for
synergies that can be better exploited among users with similar
needs. So, <strong>are you a cluster / scientific computing user of
Debian?</strong> Then <a href="mailto:leader@debian.org">let me
know</a>, and I'll be happy to get you in touch with EDF and other
users with similar interests.</p>
debian maintainers listhttp://upsilon.cc/~zack/blog/posts/2010/05/debian_maintainers_list/2010-05-19T07:09:29Z2010-05-18T21:13:00Z
<h1>on DMs and the packages they maintain</h1>
<p>The notion of <a href=
"http://wiki.debian.org/DebianMaintainer">Debian Maintainer
(DM)</a> has been with Debian for <a href=
"http://www.debian.org/vote/2007/vote_003">about 3 years</a>. As of
now, Debian has approximately 120 DMs, which maintain a significant
part of the packages in the archive.</p>
<p>I'm hereby happy to evilly spoil that <a href=
"http://www.enricozini.org/blog/">Enrico Zini</a> has just set up
an <a href="https://nm.debian.org/dm_list.html"><strong>index of
Debian Maintainers</strong></a> and of the packages they maintain.
The index is linked from the <a href=
"http://www.debian.org/devel/">people section</a> of our website as
a symbolic step in giving more credit to DMs.</p>
<p><small>Now, if anyone would like to setup a nice graph for the
<a href="http://wiki.debian.org/Statistics">statistics page</a>,
which monitors the evolution of DMs (and DDs?), that would be
useful.<br />
Any idea on how to do that retroactively too?</small></p>
UDD - consolidating bazaar metadata for QA and data mininghttp://upsilon.cc/~zack/blog/posts/2010/05/UDD_-_consolidating_bazaar_metadata_for_QA_and_data_mining/2010-05-10T20:40:39Z2010-05-10T20:40:39Z
<h1>Eclectic paper on the Ultimate Debian Database</h1>
<p>A few months ago, I've co-authored with <a href=
"http://www.lucas-nussbaum.net/blog">Lucas</a> a <strong>paper on
<a href="http://udd.debian.org">UDD</a></strong>, which has just
been presented at this year IEEE's <a href=
"http://msr.uwaterloo.ca/msr2010/index.html">Mining Software
Repository</a> conference, continuing my recent tradition of
<a href=
"http://upsilon.cc/~zack/blog/posts/2009/11/Enforcing_type-safe_linking_using_package_dependencies/">
eclectic</a> <a href=
"http://upsilon.cc/~zack/blog/posts/2010/01/Preserving_privacy_with_Google_Docs/">papers</a>.</p>
<p>The paper is titled <em>The Ultimate Debian Database:
Consolidating Bazaar Metadata for Quality Assurance and Data
Mining</em> and is available for <a href=
"http://upsilon.cc/~zack/research/publications/msr2010-udd.pdf"><strong>download</strong></a>
from my <a href=
"http://upsilon.cc/~zack/research/publications/">publications</a> page.</p>
<p>For Debian people already familiar with UDD there is probably
not much to learn from it, as the main target of the paper is the
community of scientists doing <strong>data mining on software
repositories</strong>. For them, UDD offers a valuable entry point
to Debian "facts", as data sources reflected in the database are
easily joinable together and to some extent already validated by
other UDD users (e.g. QA people). Nevertheless the <strong>first
two sections</strong> of the paper are probably of more broad
interest. There we have given our point of view on the so called
<strong>Debian Data Hell</strong>: why it exists, how it's related
to the nature of Debian and similar distros, etc.</p>
<p>I've already <a href=
"http://upsilon.cc/~zack/blog/posts/2010/01/kuhn_on_debian_ubuntu_and_the_culture_of_freedom/">
noted in the past</a> how that is also related to the
<strong>culture of freedom</strong> that in Debian we value not
only in our software, but also in our infrastructure and
procedures. We should just get rid of a bit of inertia, and total
world domination will then be just around the corner <img src=
"http://upsilon.cc/~zack/smileys/smile.png" alt=":-)" /></p>
<p>I'm happy to conclude quoting the acknowledgments section of the
paper:</p>
<h2>Acknowledgments</h2>
<blockquote>
<p>The authors would like to thank all UDD contributors, and in
particular: Christian von Essen and Marc Brockschmidt (student and
co-mentor in the Google Summer of Code which witnessed the first
UDD implementation), Olivier Berger for his support and FLOSSmole
contacts, Andreas Tille who contributed several gatherers, the
Debian community at large, the "German cabal" and Debian System
Administrators for their UDD hosting and support.</p>
</blockquote>
tickler file for maildirhttp://upsilon.cc/~zack/blog/posts/2010/05/tickler_file_for_maildir/2013-02-15T18:25:42Z2010-05-10T08:38:08Z
<h1>snooze your INBOX</h1>
<p>A few days ago Chris <a href=
"http://chris-lamb.co.uk/2010/05/04/rotating-email-your-inbox-using-imapfilter/">
made me realize</a> that my <a href=
"http://en.wikipedia.org/wiki/Getting_Things_Done">GTD</a> setup
was still missing a piece: a proper <strong><a href=
"https://en.wikipedia.org/wiki/tickler%20file">tickler
file</a></strong> implementation.</p>
<p>As I wanted an IMAP-free implementation, I've rolled up my own:
<a href=
"http://git.upsilon.cc/?p=utils/rotate-tickler.git"><strong>rotate-tickler</strong></a>.
Nothing too fancy: shell script triggered by cron, which moves
mails around a set of DELAYED.{1,..} <strong>maildir</strong>-s.
Still, it is careful about clashes (which shouldn't happen
according to the maildir specs, but better be paranoid with mails),
and properly cleans up the "seen" flag for the final (re)delivery
in INBOX.</p>
<p>Give it a try and feel free to <code>git format-patch</code>-me
your improvements. <small>(BTW, does it make any sense today to
publish any piece of software, no matter how small, without
shelving it into some <acronym title=
"Distributed Version Control System">DVCS</acronym>?)</small></p>