pages tagged debsourceszack's home pagehttp://upsilon.cc/~zack/tags/debsources/zack's home pageikiwiki2014-11-17T10:06:32ZDebsources Participation in FOSS Outreach Programhttp://upsilon.cc/~zack/blog/posts/2014/11/Debsources_Participation_in_FOSS_Outreach_Program/2014-11-17T10:06:32Z2014-11-16T12:45:49Z
<h1>Jingjie Jiang selected as OPW intern for Debsources</h1>
<p>I'm glad to announce that <a href=
"http://sophiejjj.wordpress.com/">Jingjie Jiang</a>, AKA sophiejjj,
has been selected as intern to work on <a href=
"http://sources.debian.net">Debsources</a> as part of the <a href=
"https://opw.gnome.org/">FOSS Outreach Program</a> (formerly known
as Outreach Program for Women, or OPW). I'll co-mentor her work
together with <a href="https://matthieu.io/">Matthieu
Caneill</a>.</p>
<p>I've just added <a href=
"http://sophiejjj.wordpress.com/">sophiejjj's blog</a> to Planet
Debian, so you will soon hear about her work in the Debian
blogosphere.</p>
<p>I've been impressed by the interest that the <a href=
"https://wiki.debian.org/OutreachProgramForWomen#debsources_improvements">
Debsources proposal</a> in this round of OPW has spawned. Together
with Matthieu I have interacted with more than a dozen OPW
applicants. Many of them have contributed useful patches during the
application period, and those patches have been in production at
<a href="http://sources.debian.net">http://sources.debian.net</a>
since quite a while now (see the commit log for details). A special
mention goes to Akshita Jha, who has shown a lot of determination
in tackling both simple and complex issues affecting Debsources. I
hope there will be other chances to work with her in the
future.</p>
<p>OPW internship will begin December 9th, fasten your seat belts
for a boost in Debsources development!</p>
debsources bugs and easy hackshttp://upsilon.cc/~zack/blog/posts/2014/09/debsources_bugs_and_easy_hacks/2014-09-11T17:31:13Z2014-09-11T17:31:13Z
<h1>debsources debbugs oh</h1>
<p>My <a href=
"http://upsilon.cc/~zack/blog/posts/2014/08/debsources_hacking/">ongoing quest</a>
for lowering the barrier for contributing to <a href=
"http://sources.debian.net">Debsources</a> continues.<br />
In this chapter:</p>
<ul>
<li>
<p>I've migrated <strong>bug reports</strong> from the previous
ad-hoc text file in the Git repo to the <a href=
"https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=qa.debian.org;tag=debsources">
<strong>Debian BTS</strong></a>, under the umbrella of the
<code>qa.debian.org</code> pseudo-package.<br />
From now on this is the recommended (and <a href=
"http://sources.debian.net/about/">documented</a>) way of reporting
bugs against <a href=
"http://sources.debian.net">http://sources.debian.net</a>.</p>
<p>Look ma, it also has one of those newfangled short URLs:
<a href="http://deb.li/debsrcbugs">http://deb.li/debsrcbugs</a>!</p>
</li>
<li>
<p>While at it, I've also properly tagged the current <a href=
"https://bugs.debian.org/cgi-bin/pkgreport.cgi?package=qa.debian.org;include=subject:debsources;users=debian-qa@lists.debian.org;tag=gift">
<strong>easy hacks</strong></a> on Debsources using the <a href=
"https://wiki.debian.org/qa.debian.org/GiftTag">gift tag</a>. There
are definitely opportunities for new contributors there, and there
might be more if you submit your own Debsources' pet peeves to the
BTS.</p>
<p>Again, mandatory mnemonic/short URL: <a href=
"http://deb.li/debsrceasy">http://deb.li/debsrceasy</a>.</p>
</li>
</ul>
<p>What's your excuse for not contributing to Debsources,
again?</p>
debsources hackinghttp://upsilon.cc/~zack/blog/posts/2014/08/debsources_hacking/2014-08-31T20:02:45Z2014-08-31T20:02:45Z
<h1>Debsources now has a HACKING file</h1>
<p>Here at <a href="http://debconf14.debconf.org/">DebConf14</a> I
have given a few talks. The second one has been a technical talk
about recent and future developments on <a href=
"http://sources.debian.net">Debsources</a>. Both the talk <a href=
"https://upsilon.cc/~zack/talks/2014/20140826-dc14-debsources.pdf">slides</a>
and <a href=
"http://meetings-archive.debian.net/pub/debian-meetings/2014/debconf14/webm/Debsources_powering_sourcesdebiannet.webm">
video</a> are available.</p>
<p>After the talk, various DebConf participants have approached me
and started hacking on Debsources, which is awesome! As a result of
their work, new shiny features will probably be announced shortly.
Stay tuned.</p>
<p>When discussing with new contributors (hi Luciano, Raphael!),
though, it quickly became clear that getting started with
Debsources hacking wasn't particularly easy. In particular, doing a
local deployment for testing purposes might be intimidating, due to
the need of having a (partial) source mirror and whatnot. To fix
that, I have now written a <strong><a href=
"http://anonscm.debian.org/cgit/qa/debsources.git/tree/HACKING">HACKING</a>
file for Debsources</strong>, which you can find at top-level in
the Git repo.</p>
<p>Happy Debsources hacking!</p>
debsources paper at ESEM2014http://upsilon.cc/~zack/blog/posts/2014/06/debsources_paper_at_ESEM2014/2014-06-06T11:22:34Z2014-06-06T11:22:34Z
<h1>Debsources: Live and Historical Views on Macro-Level Software
Evolution</h1>
<p>The paper entitled <a href=
"http://upsilon.cc/~zack/research/publications/debsources-esem-2014.pdf"><em>Debsources:
Live and Historical Views on Macro-Level Software
Evolution</em></a>, which I've co-authored with Matthieu Caneill,
has been accepted at <a href=
"http://softeng.polito.it/ESEIW2014/ESEM/">ESEM 2014</a>: the 8th
international symposium on Emprical Software Engineering and
Measurement.</p>
<p>In the paper we have described Debsources as a software platform
for monitoring the evolution of Free Software through the lenses of
Debian, and used the main Debsources instance (<a href=
"http://sources.debian.net">http://sources.debian.net</a>) to
replicate and extend a <a href=
"http://link.springer.com/article/10.1007/s10664-008-9100-x">former
study</a> on macro-level software evolution.</p>
<p>Now we "just" have to integrate all the nice charts and data we
have extracted for the paper into Debsources' <a href=
"http://sources.debian.net/stats/">stats page</a>...
<code>/o\</code></p>
historical overview of debian source codehttp://upsilon.cc/~zack/blog/posts/2014/04/historical_overview_of_debian_source_code/2014-06-06T11:29:49Z2014-04-06T11:19:12Z
<h1>moar, and moar, and moar debsources stats</h1>
<p>A while ago I've <a href=
"http://upsilon.cc/~zack/blog/posts/2014/02/moar_stats_for_sources.debian.net/">announced</a>
the availability of <a href=
"http://sources.debian.net/stats/">several stats</a> about Debian
source code on <a href=
"http://sources.debian.net">http://sources.debian.net</a>. Since
then the statistical basis of those stats has increased a lot, and
now includes <strong>all Debian historical releases</strong>, from
<a href="https://www.debian.org/releases/hamm/">hamm</a> (July
1998) onward. This allows to appreciate macro-level evolution
trends in Free Software, over a period of more than 15 years,
through the eyes of a distro that sits at the nice intersection of
the eldest, largest, and most reputed distros.</p>
<p>To get there I've added support for <strong>sticky
suites</strong> to the plumbing layer of <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git">debsources</a>,
and then injected historical releases from <a href=
"http://archive.debian.org">http://archive.debian.org</a>. The
injection process took about a week (without any sort of
parallelism, pretty slow disks, and computing sha256 checksums,
ctags, and sloccount on all source files) and has been an
"interesting" experience.</p>
<p>When you go back decades in technology time, <strong>bit
rot</strong> is just around the corner, and I've found <a href=
"https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740883">my</a>
<a href=
"https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741012">share</a>
while injecting <code>archive.d.o</code> into
<code>sources.d.n</code>. In both cases the respective maintainers
(Guillem and Ganneff, kudos) have been positive about and helpful
in improving the situation, despite the low impact of the bugs I've
found on the average user. That's quite important for the
<strong>long-term preservation</strong> of digital information in
general, and for the perennity of access to Free Software in the
specific case of Debian.</p>
<p>While we are it, I'm now maintaining a list of <a href=
"https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=debsources;users=zack@debian.org">
bugs affecting <code>sources.d.n</code></a> but belonging to other
packages, in case you fancy helping out but are not a Python
hacker. Interestingly enough, quite a bit of those bugs are related
to the fact that tools debsources uses (e.g. ctags, sloccount) are
also starting to show their age.</p>
<p>You might wander why <a href=
"https://www.debian.org/releases/buzz/">buzz</a>, <a href=
"https://www.debian.org/releases/rex/">rex</a>, and <a href=
"https://www.debian.org/releases/bo/">bo</a> are still missing from
<code>sources.d.n</code>. That's in fact for similar reasons.
Before hamm Debian didn't have complete archive coverage in terms
of <code>Sources</code> indexes and <code>.dsc</code> files. Given
that debsources rely on both to extract source packages, it first
needs to grow an additional abstraction layer that can cope with
their absence. It's SMOP, and planned.</p>
<p>And now let's have fun with <a href=
"https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=742605">ctags
bombs</a>.</p>
<p>Yours truly,<br />
Stefano “Indiana” Zacchiroli<br />
<small>(credits: KiBi, <code>#debian-ftp</code>)</small></p>
moar stats for sources.debian.nethttp://upsilon.cc/~zack/blog/posts/2014/02/moar_stats_for_sources.debian.net/2014-02-27T19:31:05Z2014-02-27T19:22:00Z
<h1>Debian: watch your stats!</h1>
<p>Over the past few weeks, myself and Matthieu Caneill have worked
quite a bit on <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git"><strong>Debsources</strong></a>.
As we have now deployed most of the new features on <a href=
"http://sources.debian.net">http://sources.debian.net</a>, it's
time for another <em>"What's new with Debsources?"</em> blog post.
Here is what's new:</p>
<ul>
<li>
<p>Debsources now knows about Debian <strong>suites</strong>, i.e.
which package is in which "release" (stable, testing, unstable,
...). This knowledge is already useful for some of the other
features below and will be used more in the future.</p>
</li>
<li>
<p><a href=
"http://upsilon.cc/~zack/blog/posts/2013/09/sources.debian.net_-_advanced_search_and_other_news/">
since last summer</a> Debsources has been running
<strong>sloccount</strong> on all unpacked source packages,
together with ctags and du, but the resulting information wasn't
exposed on the Web. This is now fixed. Each package now has an
<strong>infobox</strong> (<a href=
"http://sources.debian.net/src/linux/3.2.54-2">example</a>) which
shows: disk usage, archive area, suites, and sloccount with
per-language breakdown. The new infobox also subsumes the old puny
list of package links.</p>
<p>You can easily embed the infobox in other webapps if you need to
(<a href=
"http://sources.debian.net/embed/pkginfo/linux/3.2.54-2/">example</a>).
Check the <a href="http://sources.debian.net/doc/url/">URL scheme
doc</a> for more info.</p>
</li>
<li>
<p>Debsources now gathers and plot accurate Debian sources <a href=
"http://sources.debian.net/stats/"><strong>statistics</strong></a>,
both overall and per-suite, in both snapshot and <strong>historical
trends</strong> flavors.</p>
<p>(Yeah, I know, the charts are not particularly good looking ATM,
but that's easy to change without impacting the rest. So if you're
a <a href="http://matplotlib.org/">matplotlib</a> artist and
willing to help, please step forward!)</p>
</li>
<li>
<p>many changes have been going on also at the
<strong>plumbing</strong> layer to make the service less resource
hungry and more maintainable, in view of a migration to the
official Debian infrastructure --- which I've in the meantime
started discussing with DSA. Some highlights:</p>
<ul>
<li>
<p>Debsources now has a rather comprehensive <strong>test
suite</strong>, built using <a href=
"https://nose.readthedocs.org/en/latest/">Nose</a>. Most notably,
we do test full update runs down to source unpacking (of a small
subset of a Debian mirror), DB injection, and plugin execution ---
which is quite neat.</p>
</li>
<li>
<p>the updater is now much faster (about 2x) and might require, in
pathological cases, 10x <em>less</em> memory than before. Memory
usage now caps at around 300MB, even when injecting ctags for large
packages such as linux, chromium, and libreoffice.</p>
</li>
<li>
<p>the DB schema went through several refactoring cycles, and now
uses a separate <strong>file table</strong> to index all known
source file paths. In the past path information were duplicated
across the checksums and ctags tables, not only wasting DB space,
but also making the presence of file information conditional on the
enablement of at least one of the two corresponding plugins. This
is now fixed --- and migrating the full DB has been quite "fun".
Unfortunately, we've also added quite a few large-ish indexes,
resulting in no significant overall changes in DB size (currently
at ~50GB), but at least in much faster queries <img src=
"http://upsilon.cc/~zack/smileys/smile.png" alt=":-)" /></p>
<p>The next step on this front will be the addition of path-based
searches, using the excellent Postgres <a href=
"http://www.postgresql.org/docs/9.1/static/pgtrgm.html">trigram
indexes</a>.</p>
</li>
</ul>
</li>
</ul>
<p>Want more? Sure, we'll be happy to! But it'll happen faster if
you <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git;a=blob;f=BUGS;hb=refs/heads/bugs">
help</a>. Speaking of which: we've got Debsources into the <a href=
"https://wiki.debian.org/DebianFrance/NewContributorGame"><strong>new
contributors game</strong></a> (see <a href=
"https://lists.debian.org/debian-devel-announce/2014/02/msg00009.html">
announcement</a>) and we're looking forward to mentor new
contributors.</p>
forthcoming talkshttp://upsilon.cc/~zack/blog/posts/2014/01/forthcoming_talks/2014-01-14T10:12:24Z2014-01-14T10:12:24Z
<p>Over the next few weeks I'll be on the road, attending a few
Free Software events and giving talks. In particular:</p>
<ul>
<li>
<p>during the upcoming week-end (<em>18-19 January 2014</em>) there
will be a <a href=
"https://wiki.debconf.org/wiki/Miniconf-Paris/2014"><strong>mini-DebConf
in Paris</strong></a>. I'll be there to give a talk about <a href=
"http://sources.debian.net">Debsources</a> (see <a href=
"http://france.debian.net/events/minidebconf2014/">schedule</a>),
participate in the Debian France plenary meeting, and meet Debian
friends from all over Europe.</p>
</li>
<li>
<p>two weeks later (<em>1-2 February 2014</em>) I'll be at <a href=
"https://fosdem.org/2014/"><strong>FOSDEM 2014</strong></a> to give
a retrospective <a href=
"https://fosdem.org/2014/schedule/event/radical_community_angle/">talk</a>
about legal issues that Debian has faced over the past few years,
participate in a <a href=
"https://fosdem.org/2014/schedule/event/governance_round_table/">panel</a>
about Free Software governance, … and meet Free Software friends
from all over the world!</p>
</li>
<li>
<p>the day after FOSDEM end (<em>3 February 2014</em>) I'll be in
Antwerp for the opening <a href=
"http://sqm2014.sig.eu/?page=keynote">keynote</a> of the <a href=
"http://sqm2014.sig.eu/"><strong>SQM workshop</strong></a> to
discuss some of my research work at <a href=
"http://www.irill.org">IRILL</a> and current challenges in the
field</p>
</li>
</ul>
<p>See you "there"?</p>
sources.debian.net - advanced search and other newshttp://upsilon.cc/~zack/blog/posts/2013/09/sources.debian.net_-_advanced_search_and_other_news/2013-09-17T16:00:53Z2013-09-17T16:00:53Z
<h1>all your ctag (and checksum) are belong to us</h1>
<p>A few months after the <a href=
"http://upsilon.cc/~zack/blog/posts/2013/07/introducing_sources.debian.net/">initial
announcement</a>, here are some news about the <a href=
"http://sources.debian.net">sources.d.n</a> service. I've been late
in blogging this, but most of it has been implemented by myself and
Matthieu Caneill during <a href=
"http://debconf13.debconf.org/">DebConf13</a>, which has been a
great DebConf, totally exceeding my expectations (and they were
already fairly high!).</p>
<p>First, you might have noticed some <em>user-visible
changes</em>:</p>
<ul>
<li>
<p>there is now an <a href=
"http://sources.debian.net/advancedsearch/"><strong>advanced
search</strong> page</a>, which complements the already existing
<a href="http://codesearch.debian.net/">regex code search</a> with
the possibility of searching source files by their
<strong>sha256</strong>, or the <strong>ctags</strong> defined
therein</p>
</li>
<li>
<p>on the same topic, when browsing through a package and using
regex search, you'll now search by default within <em>that</em>
package, allowing to focus your searches more easily than before.
(You can easily override this by editing the search box and
removing the <code>package:</code> predicate.)</p>
</li>
<li>
<p>for the data geeks (or the wannabe host), there are now <a href=
"http://sources.debian.net/about/stats/"><strong>disk usage
stats</strong></a> (note that they don't include the database size,
though, see below for that)</p>
</li>
<li>
<p>the website also got a significant <strong>facelift</strong>, as
part of which we have moved the detailed explanations of what the
service is about out of your way. You now immediately get to the
various browsing options.</p>
</li>
</ul>
<p>On the other hand, <em>under the hood</em>:</p>
<ul>
<li>
<p>to implement ctags and sha256 searches we needed a serious DBMS,
so we switched from SQLite to <strong>PostgreSQL</strong>.</p>
<p>Again, for the data geek: storing ctags/sha256 for all of
sources.d.n content with decent indexes takes about 37 GB, for
about 160 million rows in the ctags table and 20 million rows in
the checksums one. (Currently filenames are duplicated between the
two tables so, probably, the DB disk size might be reduced
some.)</p>
</li>
<li>
<p>together with the switch to a serious DBMS, the update logics
has been completely rewritten in Python (from Bash...), and should
now be entirely transactional.</p>
</li>
<li>
<p>... and given it was going to be Python anyhow, better to enjoy
what it has to offer, no? So there is now a <strong>plugin
mechanism</strong> that makes it easier to add extra data
extractors, triggering them at each package update. Currently there
are plugins for sha256sum, ctags, and sloccount (even though the
latter is not yet exposed via the web interface). An added benefit
of this is that if you want to deploy debsources elsewhere, you can
easily disable the most time consuming extractors: running ctags
<em>and</em> sha256sum on the fabulous 3 chromium/libreoffice/linux
is not for the faint of disks...</p>
</li>
<li>
<p>we now receive <strong>push updates</strong> from the Debian
mirror network, so that you'll get updates on sources.d.n as soon
as a package hits Debian mirrors (+ processing time, which is about
15-20 minutes on the average update run). Many thanks to Simon
Paillard and Adam Lackorzynski for their help in setting this
up.</p>
</li>
<li>
<p>thanks to a <a href=
"https://lwn.net/Articles/557371/">suggestion by kugel</a> we have
adopted <a href="http://www.geany.org/">Geany</a>'s conventions for
filetype detection, and we now take into account both file
extensions and shebang lines (when available)</p>
</li>
</ul>
<p>As you usual, your bug reports (and patches!) are more than
welcome, just check <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git;a=blob;f=BUGS;hb=refs/heads/bugs">
BUGS</a> before reporting to avoid duplicates.<br />
That's all!</p>
introducing sources.debian.nethttp://upsilon.cc/~zack/blog/posts/2013/07/introducing_sources.debian.net/2013-07-05T07:38:01Z2013-07-02T14:28:14Z
<h1>all Debian source are belong to us</h1>
<p><strong>TL;DR</strong>: go to <strong><a href=
"http://sources.debian.net">http://sources.debian.net</a></strong>
and enjoy.</p>
<hr />
<p><a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git"><strong>Debsources</strong></a>
is a new toy I've been working on at <a href=
"http://www.irill.org">IRILL</a> together with <a href=
"http://matthieu-blog.fr/">Matthieu Caneill</a>. In essence,
debsources is a simple web application that allows to
<strong>publish an unpacked Debian source mirror on the
Web</strong>.</p>
<p>You can deploy Debsources where you please, but there is a main
instance at <strong><a href=
"http://sources.debian.net">http://sources.debian.net</a></strong>
(<code>sources.d.n</code> for short) that you will probably find
interesting. <code>sources.d.n</code> follows closely the Debian
archive in two ways:</p>
<ol>
<li>it is updated 4 times a day to reflect the content of the
Debian archive</li>
<li>it contains sources coming from official Debian suites: the
usual ones (from oldstable to experimental), <code>*-updates</code>
(ex volatile), <code>*-proposed-updates</code>, and
<code>*-backports</code> (from Wheezy on)</li>
</ol>
<p>Via <code>sources.d.n</code> you can therefore browse the
content of Debian source packages with usual code viewing features
like <strong>syntax highlighting</strong>. More interestingly, you
can <strong>search through</strong> the source code (of unstable
only, though) via integration with <a href=
"http://codesearch.debian.net">http://codesearch.debian.net</a>.
You can also use <code>sources.d.n</code> programmatically to
<a href="http://sources.debian.net/doc/api/">query available
versions</a> or <a href=
"http://sources.debian.net/doc/url/"><strong>link to specific
lines</strong></a>, with the possibility of adding contextual
<strong>pop-up messages</strong> (<a href=
"http://sources.debian.net/src/cowsay/3.03%2Bdfsg1-4/cowsay?hl=22:28&msg=22:Cowsay:See?%20Cowsay%20variables%20are%20declared%20here.#L22">example</a>).</p>
<p>In fact, you might have stumbled upon <code>sources.d.n</code>
already in the past few days, via other popular Debian services
where it has already been integrated. In particular:
<code>codesearch.d.n</code> now defaults to show results via
<code>sources.d.n</code>, and the <a href=
"http://packages.qa.debian.org/">PTS</a> has grown new "browse
source code" hyperlinks that point to it. If you've ideas of other
Debian services where <code>sources.d.n</code> should be
integrated, please let me know.</p>
<p>I find Debsources and <code>sources.d.n</code> already quite
useful but, as it often happens, there is still a lot <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git;a=blob;f=BUGS;hb=refs/heads/bugs">
<strong>TODO</strong></a>. Obviously, it is all Free Software
(released under GNU AGPLv3). Do not hesitate to report new bugs
and, better, to submit patches for the outstanding ones.</p>
<h2>Acknowledgements</h2>
<ul>
<li><a href="http://matthieu-blog.fr/">Matthieu Caneill</a> is the
main developer of Debsources web front-end;
<code>sources.d.n</code> wouldn't exist without him.</li>
<li>others have already contributed patches to integrate
<code>sources.d.n</code> with other services, in particular:
<ul>
<li>many thanks to Michael Stapelberg (for
<code>codesearch.d.n</code> integration), and</li>
<li>Paul Wise (for PTS integration).</li>
</ul>
</li>
<li>a full list of <a href=
"http://anonscm.debian.org/gitweb/?p=qa/debsources.git;a=blob;f=AUTHORS;hb=HEAD">
contributors</a> is available and eagerly waiting for new
additions</li>
<li><a href="http://www.irill.org">IRILL</a> has kindly provided
sponsoring for Matthieu's initial development work on Debsources,
and offered both the server and hosting facilities that power
<code>sources.d.n</code></li>
</ul>
<p><strong>PS</strong> in case you were wondering: at present
<code>sources.d.n</code> requires <strong>~381 GB</strong> of disk
space to hold all uncompressed source packages, plus ~83 GB for the
local (compressed) source mirror</p>