tags/fosdemzack's home pagehttp://upsilon.cc/~zack/tags/fosdem/zack's home pageikiwiki2017-02-12T14:03:29ZOpening the Software Heritage archivehttp://upsilon.cc/~zack/blog/posts/2017/02/Opening_the_Software_Heritage_archive/2017-02-12T14:03:29Z2017-02-12T14:03:29Z
<h1>... one API (and one FOSDEM) at a time</h1>
<p><small>[ originally <a href=
"https://www.softwareheritage.org/2017/02/04/archive-api/">posted</a>
on the <a href="https://www.softwareheritage.org/blog/">Software
Heritage blog</a>, reposted here with minor adaptations
]</small></p>
<p><strong>Last Saturday at FOSDEM we have opened up the <a href=
"https://archive.softwareheritage.org/api/">public API</a> of
<a href="https://www.softwareheritage.org/">Software Heritage</a>,
allowing to programmatically browse its <a href=
"https://www.softwareheritage.org/archive/">archive</a>.</strong></p>
<p>We posted this while I was keynoting with <a href=
"https://www.softwareheritage.org/people/">Roberto</a> at <a href=
"https://fosdem.org/2017/">FOSDEM 2017</a>, to discuss the role
Software Heritage plays in <a href=
"https://fosdem.org/2017/schedule/event/software_heritage/">preserving
the Free Software commons</a>. To accompany the talk we released
our first public API, which allows to navigate the entire content
of the Software Heritage archive as a graph of connected
development objects (e.g., blobs, directories, commits, releases,
etc.).</p>
<p>Over the past months we have been busy working on getting source
code (with full development history) into the archive, to minimize
the risk that important bits of Free/Open Sources Software that are
publicly available today disappear forever from the net, due to
whatever reason --- crashes, black hat hacking, business decisions,
you name it. As a result, our archive is already one of the largest
collections of source code in existence, spanning a <a href=
"https://help.github.com/articles/about-archiving-content-and-data-on-github/">
GitHub mirror</a>, injections of important Free Software
collections such as Debian and GNU, and an ongoing import of all
Google Code and Gitorious repositories.</p>
<p>Up to now, however, the archive was deposit-only. There was no
way for the public to access its content. While there is a lot of
value in archival <em>per se</em>, our mission is to Collect,
Preserve, and <strong>Share all the material we collect</strong>
with everybody. Plus, we totally get that a deposit-only library is
much less exciting than a store-and-retrieve one! Last Saturday we
took a first important step towards providing full access to the
content of our archive: <strong>we released <a href=
"https://archive.softwareheritage.org/api/1/">version 1 of our
public API</a>, which allows to navigate the Software Heritage
archive</strong> programmatically.</p>
<p>You can have a look at the <a href=
"https://archive.softwareheritage.org/api/">API documentation</a>
for full details about how it works. But to briefly recap:
conceptually, our archive is a giant <a href=
"https://en.wikipedia.org/wiki/Merkle_tree">Merkle DAG</a>
connecting together all development-related objects we encounter
while crawling public VCS repositories, source code releases, and
GNU/Linux distribution packages. Examples of the objects we store
are: file contents, directories, commits, releases; as well as
their metadata, such as: log messages, author information,
permission bits, etc.</p>
<p>The API we have just released allows to pointwise navigate this
huge graph. Using the API you can <strong>lookup individual objects
by their IDs</strong>, retrieve their metadata, and <strong>jump
from one object to another following links</strong> --- e.g., from
a commit to the corresponding directory or parent commits, from a
release to the annotated commit, etc. Additionally, you can
retrieve crawling-related information, such as the software origins
we track (usually as VCS clone/checkout URLs), and the <strong>full
list of visits we have done on any known software origin</strong>.
This allows, for instance, to know when we took snapshots of a Git
repository you care about and, for each visit, <strong>where each
branch of the repo was pointing to at that time</strong>.</p>
<p>Our resources for offering the API as a public service are still
quite limited. This is the reason why you will encounter a couple
of limitations. First, no download of the actual content of files
we have stored is possible yet --- you can retrieve all
content-related metadata (e.g., checksums, detected file types and
languages, etc.), but not the actual content as a byte sequence.
Second, some pretty severe rate limits apply; API access is
entirely anonymous and users are identified by their IP address,
each "user" will be able to do a little bit more than 100
requests/hour. This is to keep our infrastructure sane while we
grow in capacity and focus our attention to developing other
archive features.</p>
<p>If you're interested in having rate limits lifted for a specific
use case or experiment, please <a href=
"https://www.softwareheritage.org/contact/">contact us</a> and we
will see what we can do to help.</p>
<p>If you'd like to <strong>contribute to increase our resource
pool</strong>, have a look at our <a href=
"https://www.softwareheritage.org/support/sponsors/program/">sponsorship
program</a>!</p>
forthcoming talkshttp://upsilon.cc/~zack/blog/posts/2014/01/forthcoming_talks/2014-01-14T10:12:24Z2014-01-14T10:12:24Z
<p>Over the next few weeks I'll be on the road, attending a few
Free Software events and giving talks. In particular:</p>
<ul>
<li>
<p>during the upcoming week-end (<em>18-19 January 2014</em>) there
will be a <a href=
"https://wiki.debconf.org/wiki/Miniconf-Paris/2014"><strong>mini-DebConf
in Paris</strong></a>. I'll be there to give a talk about <a href=
"http://sources.debian.net">Debsources</a> (see <a href=
"http://france.debian.net/events/minidebconf2014/">schedule</a>),
participate in the Debian France plenary meeting, and meet Debian
friends from all over Europe.</p>
</li>
<li>
<p>two weeks later (<em>1-2 February 2014</em>) I'll be at <a href=
"https://fosdem.org/2014/"><strong>FOSDEM 2014</strong></a> to give
a retrospective <a href=
"https://fosdem.org/2014/schedule/event/radical_community_angle/">talk</a>
about legal issues that Debian has faced over the past few years,
participate in a <a href=
"https://fosdem.org/2014/schedule/event/governance_round_table/">panel</a>
about Free Software governance, … and meet Free Software friends
from all over the world!</p>
</li>
<li>
<p>the day after FOSDEM end (<em>3 February 2014</em>) I'll be in
Antwerp for the opening <a href=
"http://sqm2014.sig.eu/?page=keynote">keynote</a> of the <a href=
"http://sqm2014.sig.eu/"><strong>SQM workshop</strong></a> to
discuss some of my research work at <a href=
"http://www.irill.org">IRILL</a> and current challenges in the
field</p>
</li>
</ul>
<p>See you "there"?</p>
GPL-d Debian software skew (?)http://upsilon.cc/~zack/blog/posts/2012/02/gpl_d_debian_software_skew/2012-02-19T21:22:06Z2012-02-18T16:33:05Z
<p>At <a href="http://fosdem.org/2012/">FOSDEM</a>, John Sullivan
delivered an interesting talk titled <a href=
"http://fosdem.org/2012/schedule/event/is_copyleft_being_framed">Is
copyleft being framed?</a> to verify alleged claims on the decline
of GPL-d software. (<a href=
"http://info9.net/wiki/fosdem/LegalIssuesDevRoom/Speakers/sullivan_slides.pdf">Slides</a>
are available.) The crux of the talk is the analysis he performed
on the Debian archive to discover the amount of software we
distribute that is covered by GPL, LGPL, or AGPL ("GPL-d" for short
in the remainder).</p>
<p>John's talk steps in an interesting and long running debate (a
recent summary of which is available in this <a href=
"http://www.itwire.com/business-it-news/open-source/52838-gpl-use-in-debian-on-the-rise-study">
ITWire article</a>). The most interesting part is the discrepancy
among John's results and <a href=
"http://www.blackducksoftware.com/">Blackduck</a>'s, which are
often used to <a href=
"http://blogs.the451group.com/opensource/2011/12/15/on-the-continuing-decline-of-the-gpl/">
argue how the popularity of the GPL license is declining</a>. That
might be the case. Or not. The more analyses we do to find it out,
the better.</p>
<p>The underlying assumption on John's work is that Debian is a
representative sample of the Free Software out there, which I think
is a reasonable assumption. I find the analysis presented in the
talk completely satisfactorily from a purely scientific point of
view. The same cannot be said about Blackduck's result: both their
methods and data are secret, making it impossible to reproduce
their experiments. Highly <em>un</em>scientific.</p>
<p>Still, John's results are surprising: as much as 87 percent of
Lenny's packages and 93 percent of Squeeze's are GPL-d. That seems
<em>a lot</em>. Puzzled about that, John discussed with me the
issue before his talk, in search for pitfalls in his methods or
data. Finding none, I pointed him to the almighty <a href=
"http://dktrkranz.wordpress.com/">DktrKranz</a> for some extra
review; who found nothing either. To stay on the safe side, even
during his talk John called for independent reviews of his results.
<strong>What could be wrong?</strong></p>
<p>The tool used to gather the data is <a href=
"http://anonscm.debian.org/gitweb/?p=dbnpolicy/policy.git;a=blob;f=tools/license-count;hb=HEAD">
license-count</a> from the <code>debian-policy</code> package.
Input data are the <code>debian/copyright</code> files of all
Debian source packages. If <code>license-count</code> is not
bugged, our <code>debian/copyright</code> files might be. One thing
that occurred to me only a few days ago is the <strong>habit of
declaring a different license for Debian packaging</strong> (the
files under <code>debian/</code>) than the software being packaged
itself. That's a bad habit—because it might cause unwanted license
mixtures via patches that live under <code>debian/</code>—but I've
seen several occurrences of it in the Debian archive. For name and
(self-)shame: I've also been guilty of it in the past, <em>when I
was young™</em>.</p>
<p><strong>Is that reason enough to skew results and overestimate
GPL-d software?</strong> I don't think so, I hope not, but
ultimately… I don't know. It'd be nice to rule out the possibility
entirely. So if anyone is willing to do some sampling of affected
<code>debian/copyright</code> files and propose patches for
<code>license-count</code> to exclude those "false positives",
please shout. (As a bonus point: that would also help to take more
sound decision for the typical use case of
<code>license-count</code>, i.e. deciding when a license should be
added to <code>/usr/share/common-licenses</code>.)</p>
<p>Other independent reviews of the results are equally
welcome.</p>
<p>Note: the above, as well as John's analysis, would be a trivial
exercise if <a href="http://dep.debian.net/deps/dep5/">DEP-5</a>
were already widely deployed in the Debian archive.</p>
<hr />
<p><strong>Update</strong>: add link to John's slides<br />
<strong>Update 19/02/2012</strong>: Russ Allbery, author of
<code>license-count</code>, <a href=
"http://www.eyrie.org/~eagle/journal/2012-02/002.html">posted</a> a
way more likely cause of data skew in John's analysis: double
counting among the different types of copyleft licenses</p>
fosdem 2012http://upsilon.cc/~zack/blog/posts/2012/02/fosdem_2012/2012-02-03T13:30:19Z2012-02-03T13:30:19Z
<p>In less then 2 hours I'll leave for the Paris Nord station to
catch a train headed to Bruxelles Midi. Plan of the week-end:
attend and enjoy <a href="http://fosdem.org/2012/">FOSDEM
2012</a>!.</p>
<p>I haven't submitted any talk for this year FOSDEM edition, but
I've been invited (and gladly accepted) to join the <a href=
"http://fosdem.org/2012/schedule/event/contributor_communities">round
table on working with contributor communities</a> on Sunday. I'm
positive it will be a nice occasion to share ideas on how to
structure local user groups around the world.</p>
<p>Beside that, I plan to attend several talks of the <a href=
"http://fosdem.org/2012/schedule/track/crossdistribution_devroom">cross-distribution</a>,
<a href=
"http://fosdem.org/2012/schedule/track/legal_issues_devroom">legal
issues</a> devrooms, hang around the Debian booth, as well as
discuss <em>many</em> topics with people and friends from all over
the Free Software multiverse.</p>
<p>Too bad I'm still recovering from a recent minor health issue; I
won't be able to get the most out of today's <a href=
"http://fosdem.org/2012/beerevent">beer event</a>. But I'll attend
nonetheless, see you there?</p>
in the news - Debian and the FreedomBoxhttp://upsilon.cc/~zack/blog/posts/2011/02/in_the_news_-_Debian_and_the_FreedomBox/2011-02-17T21:05:25Z2011-02-16T17:52:40Z
<p>Fellow geeks who have attended <a href=
"http://www.fosdem.org">FOSDEM</a> this year probably remember Eben
Moglen's <a href=
"http://www.youtube.com/fosdemtalks#p/u/16/-BSLBvwyUEs">announcement</a>
about the creation of a foundation to support the development of
<a href="http://wiki.debian.org/FreedomBox">FreedomBox</a>-es.</p>
<p>The <a href="http://www.freedomboxfoundation.org/">FreedomBox
foundation</a> and its goals have been featured in a <a href=
"http://www.nytimes.com/2011/02/16/nyregion/16about.html">New York
Times article</a> appeared yesterday, together with a nice Debian
mention which points to our wiki. Debian is also prominently
present on the (<a href=
"http://packages.debian.org/sid/ikiwiki">ikiwiki</a>-powered)
website of the foundation.</p>
<p>Yet another reason why I'm proud of being part of <a href=
"http://www.debian.org">Debian</a>.</p>
<hr />
<p><small>(thanks to Faidon for the heads up)</small></p>
<hr />
<ul>
<li><strong>update</strong>: <a href=
"http://freedomboxfoundation.org/news/2011-02-17-inthepress/">this
blog post</a> by the FreedomBox foundation contains more "in the
press" pointers (to both the foundation and Debian), including an
article in the <a href=
"http://blogs.wsj.com/digits/2011/02/16/freedom-box-needs-a-good-user-interface/">
Wall Street Journal</a></li>
<li><strong>update</strong>: the article on NYT website is indeed
intermittently payware/accessible. For future reference, I've made
available an HTML-only <a href=
"http://upsilon.cc/~zack/blog/posts/2011/02/in_the_news_-_Debian_and_the_FreedomBox/Eben_Moglen_Is_Reshaping_Internet_With_a_Freedom_Box_-_NYTimes.com.html">
static version of the article</a> here (may NYT forgive me).</li>
</ul>
ZOMG a Debian releasehttp://upsilon.cc/~zack/blog/posts/2011/02/ZOMG_a_Debian_release/2013-02-15T18:25:42Z2011-02-08T09:44:58Z
<h1>mythbustering a Debian release</h1>
<p>This is no news anymore, but in case you don't know yet:
<a href="http://www.debian.org/News/2011/20110205a"><strong>Debian
6.0 "Squeeze" has been released</strong></a> the past week-end. If
you haven't yet downloaded Squeeze, stop reading this blog post
right here and jump to:</p>
<div class="center"><big><a href=
"http://deb.li/squeeze">http://deb.li/squeeze</a></big></div>
<p>to choose your ISO; or check the <a href=
"http://www.debian.org/releases/stable/">release notes</a> for
upgrade instructions from Debian 5.0 "Lenny".</p>
<p>Done?<br />
Cool!<br /></p>
<p>I hope you are now enjoying Squeeze as much as I've enjoyed
being part of its development cycle which:</p>
<ul>
<li>has lasted 24 months from the <a href=
"http://www.debian.org/releases/lenny/">Lenny release</a></li>
<li>has been worked on by a <strong>volunteer Project</strong> of
about 900 members and thousands of other volunteer
contributors</li>
<li>has <a href=
"http://blog.schmehl.info/2011/01/19#bugs-closed-for-squeeze">closed
150'000 bugs</a></li>
<li>has increased user freedom by delivering a <a href=
"http://upsilon.cc/~zack/blog/posts/2010/12/squeeze_your_non-free_firmware_away/"><strong>
Free Linux kernel</strong></a></li>
<li>has added 2 <strong>non-Linux ports</strong> (<a href=
"http://www.debian.org/ports/kfreebsd-gnu/">kfreefbsd 32/64
bits</a>) to the already large family of <a href=
"http://www.debian.org/ports/">Debian ports</a></li>
<li>will continue the tradition of <strong>archive-wide long term
security support</strong> (lasting about 3-3.5 years given current
<a href=
"http://raphaelhertzog.com/2011/02/06/debian-6-0-is-out-wheezy-kicks-off/">
Debian release cadence</a>)</li>
<li>maintains, stubbornly, the tradition of a rock-solid
<strong>Debian-quality</strong> system, made of packages which have
been "tortured" by testing utils like <a href=
"http://piuparts.debian.org/">piuparts</a>, <a href=
"http://edos.debian.net/">edos-debcheck</a>, and frequent archive
rebuilds <small>(after all, what is Free Software for if you cannot
recompile your programs?)</small></li>
<li>has added 10'000 new (binary) packages</li>
<li>has provided official <a href=
"http://backports.debian.org">backport service</a></li>
<li>... etc, you got the idea <img src="http://upsilon.cc/~zack/smileys/smile.png"
alt=":-)" /></li>
</ul>
<p>I'm still shaked by the events, given the release happened in a
sort of split context: the teams working on the final phases of the
release (<a href="http://wiki.debian.org/Teams/ReleaseTeam">release
team</a>, <a href=
"http://wiki.debian.org/Teams/FTPMaster">ftp-masters</a>, <a href=
"http://wiki.debian.org/Teams/Webmaster">webmasters</a>, <a href=
"http://wiki.debian.org/Teams/DebianCd">cd</a>, <a href=
"http://wiki.debian.org/Teams/DSA">DSA</a>, etc.) were "at home"
hacking frantically on it, while many EU-based Debian people
(including yours truly) were at <a href=
"http://www.fosdem.org/2011/">FOSDEM</a> representing Debian to the
community with booth, talks, and answering the recurrent question
«so, have you released yet?».</p>
<p>At FOSDEM, I've been personally submerged by congratulation
messages that are not really for me, but rather for the Debian
community at large. So: <strong>congratulations folks</strong>!
People out there—be them Debian users, users of some derivative, or
Free Software enthusiasts in general—seem to really love what we
have achieved with Squeeze!</p>
<p>The people that need to be thanked for this result are way too
many, so I won't try to name names. Nonetheless, I've a few
<strong>personal kudos</strong> to deliver to:</p>
<ul>
<li>
<p>The <strong>release team</strong> for the fantastic coordination
and communication job over the past few months. They have also
contributed to <strong>mythbustering #1</strong>: <em>Debian cannot
fix a release date</em> (a bit) <em>in advance</em>.</p>
</li>
<li>
<p>All the people who have worked on <strong>fixing RC
bugs</strong> by sending patches, reviewing and testing them,
preparing <strong>NMUs</strong>, etc. I'll never give up my belief
that <a href="http://upsilon.cc/~zack/hacking/debian/rcbw/">releasing is a shared
responsibility</a> and that we cannot scale without realizing that
and changing our culture accordingly. All this people have
contributed to move towards <strong>mythbustering #2</strong>:
<em>NMUs are bad</em>.</p>
</li>
<li>
<p>The <a href=
"http://wiki.debian.org/Teams/Publicity"><strong>publicity
team</strong></a> which—with release live blogging via <a href=
"http://identi.ca/debian">@debian</a>, blog posts, and press
releases—have contributed to <strong>mythbustering #3</strong>:
<em>Debian isn't able to communicate about the "cool" stuff they
are doing</em>.</p>
</li>
<li>
<p>The <strong>webmaster team</strong> which has done an <a href=
"http://www.debian.org/News/2011/20110205b">incredible job</a> at
<strong>mythbustering #4</strong>: <em>Debian <a href=
"http://www.debian.org">web presence</a> sucks</em>.</p>
</li>
</ul>
<p>I'm overwhelmed by happiness about all that and I'll cherish it
forever as a <em>souvenir</em> of what a community of volunteers,
driven by <a href="http://www.debian.org/social_contract">common
ideals</a>, can achieve.</p>
<p>Now let's <a href=
"http://wiki.debian.org/ReleasePartySqueeze">party</a> and then
roll up our sleeves for Wheezy, which is already <a href=
"http://lists.debian.org/debian-devel-announce/2011/02/msg00003.html">
open for development</a>.</p>
<hr />
<p><strong>PS</strong> I've <a href=
"http://git.upsilon.cc/?p=talks/20110206-fosdem.git;a=tree;h=refs/heads/pdf;hb=pdf">
talked</a> again at FOSDEM about the relevance of Debian in the
Free Software ecosystem. I've the impression the message is getting
through: check out the very nice article <a href=
"http://www.networkworld.com/community/why-debian-matters-more-than-ever">
<em>Why Debian matters more than ever</em></a> by Zonker.</p>