Hi (again), I'm Zack, and this is my blog. Have a look at the most recent posts below, or browse the tag cloud here on the right.
Archives are available as well.
You can leave comments on my posts by following the relevant link associated to each post. Alternatively, you can mail me comments; note that unless otherwise requested, I will add mailed comments in the comment feeds.
Debsources: Live and Historical Views on Macro-Level Software Evolution
The paper entitled Debsources: Live and Historical Views on Macro-Level Software Evolution, which I've co-authored with Matthieu Caneill, has been accepted at ESEM 2014: the 8th international symposium on Emprical Software Engineering and Measurement.
In the paper we have described Debsources as a software platform for monitoring the evolution of Free Software through the lenses of Debian, and used the main Debsources instance (http://sources.debian.net) to replicate and extend a former study on macro-level software evolution.
Now we "just" have to integrate all the nice charts and data we
have extracted for the paper into Debsources' stats page...
moar, and moar, and moar debsources stats
A while ago I've announced the availability of several stats about Debian source code on http://sources.debian.net. Since then the statistical basis of those stats has increased a lot, and now includes all Debian historical releases, from hamm (July 1998) onward. This allows to appreciate macro-level evolution trends in Free Software, over a period of more than 15 years, through the eyes of a distro that sits at the nice intersection of the eldest, largest, and most reputed distros.
To get there I've added support for sticky suites to the plumbing layer of debsources, and then injected historical releases from http://archive.debian.org. The injection process took about a week (without any sort of parallelism, pretty slow disks, and computing sha256 checksums, ctags, and sloccount on all source files) and has been an "interesting" experience.
When you go back decades in technology time, bit
rot is just around the corner, and I've found my
sources.d.n. In both cases the respective maintainers
(Guillem and Ganneff, kudos) have been positive about and helpful
in improving the situation, despite the low impact of the bugs I've
found on the average user. That's quite important for the
long-term preservation of digital information in
general, and for the perennity of access to Free Software in the
specific case of Debian.
While we are it, I'm now maintaining a list of
sources.d.n but belonging to other
packages, in case you fancy helping out but are not a Python
hacker. Interestingly enough, quite a bit of those bugs are related
to the fact that tools debsources uses (e.g. ctags, sloccount) are
also starting to show their age.
You might wander why buzz, rex, and bo are still missing from
sources.d.n. That's in fact for similar reasons.
Before hamm Debian didn't have complete archive coverage in terms
Sources indexes and
.dsc files. Given
that debsources rely on both to extract source packages, it first
needs to grow an additional abstraction layer that can cope with
their absence. It's SMOP, and planned.
And now let's have fun with ctags bombs.
Stefano “Indiana” Zacchiroli
Debian: watch your stats!
Over the past few weeks, myself and Matthieu Caneill have worked quite a bit on Debsources. As we have now deployed most of the new features on http://sources.debian.net, it's time for another "What's new with Debsources?" blog post. Here is what's new:
Debsources now knows about Debian suites, i.e. which package is in which "release" (stable, testing, unstable, ...). This knowledge is already useful for some of the other features below and will be used more in the future.
since last summer Debsources has been running sloccount on all unpacked source packages, together with ctags and du, but the resulting information wasn't exposed on the Web. This is now fixed. Each package now has an infobox (example) which shows: disk usage, archive area, suites, and sloccount with per-language breakdown. The new infobox also subsumes the old puny list of package links.
Debsources now gathers and plot accurate Debian sources statistics, both overall and per-suite, in both snapshot and historical trends flavors.
(Yeah, I know, the charts are not particularly good looking ATM, but that's easy to change without impacting the rest. So if you're a matplotlib artist and willing to help, please step forward!)
many changes have been going on also at the plumbing layer to make the service less resource hungry and more maintainable, in view of a migration to the official Debian infrastructure --- which I've in the meantime started discussing with DSA. Some highlights:
Debsources now has a rather comprehensive test suite, built using Nose. Most notably, we do test full update runs down to source unpacking (of a small subset of a Debian mirror), DB injection, and plugin execution --- which is quite neat.
the updater is now much faster (about 2x) and might require, in pathological cases, 10x less memory than before. Memory usage now caps at around 300MB, even when injecting ctags for large packages such as linux, chromium, and libreoffice.
the DB schema went through several refactoring cycles, and now uses a separate file table to index all known source file paths. In the past path information were duplicated across the checksums and ctags tables, not only wasting DB space, but also making the presence of file information conditional on the enablement of at least one of the two corresponding plugins. This is now fixed --- and migrating the full DB has been quite "fun". Unfortunately, we've also added quite a few large-ish indexes, resulting in no significant overall changes in DB size (currently at ~50GB), but at least in much faster queries
The next step on this front will be the addition of path-based searches, using the excellent Postgres trigram indexes.
Want more? Sure, we'll be happy to! But it'll happen faster if you help. Speaking of which: we've got Debsources into the new contributors game (see announcement) and we're looking forward to mentor new contributors.
skyrocketing how-can-i-help popcon count
how-can-i-help by Lucas Nussbaum is one of the best things that happened in the area of attracting contributions to Debian in quite a while. It can be used both as a standalone tool to list opportunities for contributing to Debian which are related to your installed packages, and as an APT hook (which is also the default configuration) that at each upgrade will inform you of new contribution opportunities.
how-can-i-help is great for newbies who are looking for ways to give back to Debian which are a good match for their skills: among other things, how-can-i-help shows bugs tagged "gift" related to packages you use.
how-can-i-help is also great for experienced developers, as it allows them to find out, in a timely manner, that packages they use are in dire need of help: RC bugs, pending removals, adoptions needed, requests for sponsor, etc. (As highly unscientific evidence: I've noticed a rather quick turnover of RFA/O/ITA bugs on packages installed on my machine. I suspect how-can-i-help is somehow responsible for that, due to the fact that it increases awareness of ongoing package issues directly with the people using them.)
So, if you haven't yet, please
how-can-i-help RIGHT NOW.
I daresay that we should aim at installing how-can-i-help by default on all Debian machines, but that might be an ambitious initial goal. In the meantime I'll settle for making how-can-i-help's popcon count skyrocket. As of today, it looks like this:
which is definitely too low for my taste. Please spread the word about how-can-i-help. And let's see what we can collectively do to that graph.
how-can-i-help is just a tiny teeny helper, but I'm convinced it can do wonders in liberating dormant contributions to the Debian Project.
Qualche giorno fa ho partecipato alla trasmissione radiofonica Caterpillar su Radio 2, per parlare della recente pubblicazione delle linee guida che implementano l'articolo 68 del Codice per l'Amministrazione Digitale e dell'obbligo di preferire il software libero nella pubblica amministrazione (PA) italiana che tale articolo prevede. (Qui potete trovare un post in Italiano sullo stesso argomento.)
Chiacchierare con i Dott. Cirri e Zambotti è stato, come la volta scorsa, piacevole, divertente, e credo informativo per il grande pubblico di Radio 2. Il tutto è stato coronata dalla bella sorpresa che i conduttori mi hanno fatto di invitare, in coda al mio intervento, gli ascoltatori ad intervenire in diretta per segnalare success stories di adozione di software libero nella PA italiana.
A future memoria, rendo disponibile qui il podcast della puntata del 20 Gennaio 2014 (terza parte). Il mio intervento inizia al minuto 9" circa. Buon ascolto!
Il motivo è che drivecast sembra avere rimosso a partire da Gennaio 2014 la possibilità di registrare emittenti radio e generare podcast feed a partire da esse. Se siete a conoscenza di servizi alternativi che posso usare per ripristinare il servizio, vi prego di farmi sapere.
Over the next few weeks I'll be on the road, attending a few Free Software events and giving talks. In particular:
during the upcoming week-end (18-19 January 2014) there will be a mini-DebConf in Paris. I'll be there to give a talk about Debsources (see schedule), participate in the Debian France plenary meeting, and meet Debian friends from all over Europe.
two weeks later (1-2 February 2014) I'll be at FOSDEM 2014 to give a retrospective talk about legal issues that Debian has faced over the past few years, participate in a panel about Free Software governance, … and meet Free Software friends from all over the world!
See you "there"?
org-mutt with org-mode >= 8
Thanks to Don I just remembered that I haven't yet announced org-mutt support for org-mode >= 8. Let's catch up!
Since a few weeks I've been aware of the fact that my mutt/org-mode glue, AKA org-mutt, was no longer working with org-mode >= 8, due to the ditching of org-remember in favor of org-capture. Allegedly, org-capture should have been backward compatible, but it clearly is not.
I've just updated the canonical org-mutt blog post, so that the documentation in there is up to date again. If you're using org-mutt, I suggest to refer to the Git repository as the canonical location for future updates, if any.
Thanks Don, thanks Mako!
all your ctag (and checksum) are belong to us
A few months after the initial announcement, here are some news about the sources.d.n service. I've been late in blogging this, but most of it has been implemented by myself and Matthieu Caneill during DebConf13, which has been a great DebConf, totally exceeding my expectations (and they were already fairly high!).
First, you might have noticed some user-visible changes:
on the same topic, when browsing through a package and using regex search, you'll now search by default within that package, allowing to focus your searches more easily than before. (You can easily override this by editing the search box and removing the
for the data geeks (or the wannabe host), there are now disk usage stats (note that they don't include the database size, though, see below for that)
the website also got a significant facelift, as part of which we have moved the detailed explanations of what the service is about out of your way. You now immediately get to the various browsing options.
On the other hand, under the hood:
to implement ctags and sha256 searches we needed a serious DBMS, so we switched from SQLite to PostgreSQL.
Again, for the data geek: storing ctags/sha256 for all of sources.d.n content with decent indexes takes about 37 GB, for about 160 million rows in the ctags table and 20 million rows in the checksums one. (Currently filenames are duplicated between the two tables so, probably, the DB disk size might be reduced some.)
together with the switch to a serious DBMS, the update logics has been completely rewritten in Python (from Bash...), and should now be entirely transactional.
... and given it was going to be Python anyhow, better to enjoy what it has to offer, no? So there is now a plugin mechanism that makes it easier to add extra data extractors, triggering them at each package update. Currently there are plugins for sha256sum, ctags, and sloccount (even though the latter is not yet exposed via the web interface). An added benefit of this is that if you want to deploy debsources elsewhere, you can easily disable the most time consuming extractors: running ctags and sha256sum on the fabulous 3 chromium/libreoffice/linux is not for the faint of disks...
we now receive push updates from the Debian mirror network, so that you'll get updates on sources.d.n as soon as a package hits Debian mirrors (+ processing time, which is about 15-20 minutes on the average update run). Many thanks to Simon Paillard and Adam Lackorzynski for their help in setting this up.
As you usual, your bug reports (and patches!) are more than
welcome, just check
BUGS before reporting to avoid duplicates.