At FOSDEM, John Sullivan delivered an interesting talk titled Is copyleft being framed? to verify alleged claims on the decline of GPL-d software. (Slides are available.) The crux of the talk is the analysis he performed on the Debian archive to discover the amount of software we distribute that is covered by GPL, LGPL, or AGPL ("GPL-d" for short in the remainder).

John's talk steps in an interesting and long running debate (a recent summary of which is available in this ITWire article). The most interesting part is the discrepancy among John's results and Blackduck's, which are often used to argue how the popularity of the GPL license is declining. That might be the case. Or not. The more analyses we do to find it out, the better.

The underlying assumption on John's work is that Debian is a representative sample of the Free Software out there, which I think is a reasonable assumption. I find the analysis presented in the talk completely satisfactorily from a purely scientific point of view. The same cannot be said about Blackduck's result: both their methods and data are secret, making it impossible to reproduce their experiments. Highly unscientific.

Still, John's results are surprising: as much as 87 percent of Lenny's packages and 93 percent of Squeeze's are GPL-d. That seems a lot. Puzzled about that, John discussed with me the issue before his talk, in search for pitfalls in his methods or data. Finding none, I pointed him to the almighty DktrKranz for some extra review; who found nothing either. To stay on the safe side, even during his talk John called for independent reviews of his results. What could be wrong?

The tool used to gather the data is license-count from the debian-policy package. Input data are the debian/copyright files of all Debian source packages. If license-count is not bugged, our debian/copyright files might be. One thing that occurred to me only a few days ago is the habit of declaring a different license for Debian packaging (the files under debian/) than the software being packaged itself. That's a bad habit—because it might cause unwanted license mixtures via patches that live under debian/—but I've seen several occurrences of it in the Debian archive. For name and (self-)shame: I've also been guilty of it in the past, when I was young™.

Is that reason enough to skew results and overestimate GPL-d software? I don't think so, I hope not, but ultimately… I don't know. It'd be nice to rule out the possibility entirely. So if anyone is willing to do some sampling of affected debian/copyright files and propose patches for license-count to exclude those "false positives", please shout. (As a bonus point: that would also help to take more sound decision for the typical use case of license-count, i.e. deciding when a license should be added to /usr/share/common-licenses.)

Other independent reviews of the results are equally welcome.

Note: the above, as well as John's analysis, would be a trivial exercise if DEP-5 were already widely deployed in the Debian archive.

Update: add link to John's slides
Update 19/02/2012: Russ Allbery, author of license-count, posted a way more likely cause of data skew in John's analysis: double counting among the different types of copyleft licenses

Calling having a different license for the packaging than upstreams choses for their software is a bit ignorant to me. Trouble is: GPL is an exclusive license and doesn't play well with other licensed stuff. Sharing packaging work between different packages on the other hand is something that should be encouraged, in very many ways - the GPL does limit that though to GPLed packages (or GPL compatible licenses turning them legally into GPLed software in the end).

I stumbled into the issue already once when I asked for permission to take some GPLed packaging into a package of mine, and was denied to relicense it to a different DFSG free/OSI approved/FSF approved license. So actually I would call it a bad habit to license packaging under copyleft instead of a permissive license, because it causes more troubles than real gain.

Chosing to going by upstream's license might especially troublesome when packaging contrib or non-free software, so it is a bad advise.

Enjoy, Rhonda

Comment by rhonda Sat 18 Feb 2012 08:55:04 PM CET

Chosing to going by upstream's license might especially troublesome when packaging contrib or non-free software, so it is a bad advise.

This is a pretty compelling argument.

(I find a bit less compelling the "sharing packaging" argument, if that goes hand in hand with copy/paste packaging stuff across different packages. I think we should rather encourage factoring out common packaging "code" as much as possible, because that also makes it easier to deploy archive wide changes and reduce inertia in Debian. But all this is a bit tangential, back to the non-free argument…)

I've been slightly imprecise. In fact I don't have any issue with packaging in general being release under a different license, because most of it does not get linked or otherwise combined with upstream code. What worries me is that a different packaging license might induce unexpected license mixtures unless the maintainer is very careful. For one thing, debian/patches/ should not be under a different license than upstream software; doing so is dangerous in terms of license incompatibilities and in most cases would also make it very hard if not impossible to push the patch upstream. But that's not the only case, as there might be other Debian-specific code generated during build that might end up being loaded by upstream code.

Very careful maintainers will surely get this right in debian/copyright and will always avoid incompatible license mixtures. But to be honest, encouraging it as a practice seems asking for trouble.

Comment by zack Sun 19 Feb 2012 12:27:39 AM CET

Hi all,

I just realised that some dh-make templates for Debian copyright files lost since 0.58 their recommendation to not use a license with stronger terms than upstream for the Debian packaging, and reopened #598411 accordingly.

Comment by Charles Plessy Sun 19 Feb 2012 02:22:08 AM CET