tags/paperzack's home pagehttp://upsilon.cc/~zack/tags/paper/zack's home pageikiwiki2010-05-10T20:40:39ZUDD - consolidating bazaar metadata for QA and data mininghttp://upsilon.cc/~zack/blog/posts/2010/05/UDD_-_consolidating_bazaar_metadata_for_QA_and_data_mining/2010-05-10T20:40:39Z2010-05-10T20:40:39Z
<h1>Eclectic paper on the Ultimate Debian Database</h1>
<p>A few months ago, I've co-authored with <a href=
"http://www.lucas-nussbaum.net/blog">Lucas</a> a <strong>paper on
<a href="http://udd.debian.org">UDD</a></strong>, which has just
been presented at this year IEEE's <a href=
"http://msr.uwaterloo.ca/msr2010/index.html">Mining Software
Repository</a> conference, continuing my recent tradition of
<a href=
"http://upsilon.cc/~zack/blog/posts/2009/11/Enforcing_type-safe_linking_using_package_dependencies/">
eclectic</a> <a href=
"http://upsilon.cc/~zack/blog/posts/2010/01/Preserving_privacy_with_Google_Docs/">papers</a>.</p>
<p>The paper is titled <em>The Ultimate Debian Database:
Consolidating Bazaar Metadata for Quality Assurance and Data
Mining</em> and is available for <a href=
"http://upsilon.cc/~zack/research/publications/msr2010-udd.pdf"><strong>download</strong></a>
from my <a href="http://upsilon.cc/~zack/research/publications/">publications</a>
page.</p>
<p>For Debian people already familiar with UDD there is probably
not much to learn from it, as the main target of the paper is the
community of scientists doing <strong>data mining on software
repositories</strong>. For them, UDD offers a valuable entry point
to Debian "facts", as data sources reflected in the database are
easily joinable together and to some extent already validated by
other UDD users (e.g. QA people). Nevertheless the <strong>first
two sections</strong> of the paper are probably of more broad
interest. There we have given our point of view on the so called
<strong>Debian Data Hell</strong>: why it exists, how it's related
to the nature of Debian and similar distros, etc.</p>
<p>I've already <a href=
"http://upsilon.cc/~zack/blog/posts/2010/01/kuhn_on_debian_ubuntu_and_the_culture_of_freedom/">
noted in the past</a> how that is also related to the
<strong>culture of freedom</strong> that in Debian we value not
only in our software, but also in our infrastructure and
procedures. We should just get rid of a bit of inertia, and total
world domination will then be just around the corner <img src=
"http://upsilon.cc/~zack/smileys/smile.png" alt=":-)" /></p>
<p>I'm happy to conclude quoting the acknowledgments section of the
paper:</p>
<h2>Acknowledgments</h2>
<blockquote>
<p>The authors would like to thank all UDD contributors, and in
particular: Christian von Essen and Marc Brockschmidt (student and
co-mentor in the Google Summer of Code which witnessed the first
UDD implementation), Olivier Berger for his support and FLOSSmole
contacts, Andreas Tille who contributed several gatherers, the
Debian community at large, the "German cabal" and Debian System
Administrators for their UDD hosting and support.</p>
</blockquote>
Preserving privacy with Google Docshttp://upsilon.cc/~zack/blog/posts/2010/01/Preserving_privacy_with_Google_Docs/2010-01-15T09:49:50Z2010-01-15T09:43:58Z
<h1>Eclectic paper: SEcure GOogle DOCumentS</h1>
<p>Two days from an <a href=
"http://googleblog.blogspot.com/2010/01/new-approach-to-china.html">
important Google announcement</a>, <strong>privacy
awareness</strong> is steadily increasing in the media. The old
mantra that "despotic governments might use your data in unexpected
way" sounds more real than last week, and <a href=
"http://www.imdb.com/title/tt0405094/">recent movies</a> ring
different bells in our heads.</p>
<p>That event has prodded me to (finally!) blog about <a href=
"http://upsilon.cc/~zack/blog/posts/2009/11/Enforcing_type-safe_linking_using_package_dependencies/">
yet another eclectic paper</a> of <a href=
"http://upsilon.cc/~zack/research/publications/">mine</a>, co-authored with my old
friend <a href=
"http://www.cs.unibo.it/~gdangelo/index-eng.html">Gabriele
D'Angelo</a>, and which I'm going to present at <a href=
"http://www.acm.org/conferences/sac/sac2010/">the forthcoming ACM
SAC conference</a>. The paper is titled <a href=
"http://upsilon.cc/~zack/research/publications/sac10-coclo.pdf"><strong>Content
Cloaking: Preserving Privacy with Google Docs and other Web
Applications</strong></a> and poses (again) a rather simple
question: why should you trust Google to faithfully store your
<strong><a href="http://docs.google.com">Google Docs</a>
data</strong>? What if roles in the recent Google-vs-China issue
were inverted?</p>
<p>The proposed solution (Content Cloaking) then simply implements
<strong>transparent encryption and decryption</strong> in the
payload which is sent back and forth between your browser and the
Docs backend. Trying to access your Docs data without a decryption
layer and the needed key will then just show garbage, for both
humans and Google harvesters. Of course you lose something, like
full text search which is performed server-side by Google, but at
least you're back in charge again: it is you who decides to which
extent trading-off your privacy with offered services.</p>
<p>A <strong>proof-of-concept implementation</strong> is provided
(and of course is free software!) as an extension for the Firefox
browser, but is now out of date wrt Firefox mainline and was not
really production ready anyhow (let's say it was
master-thesis-implementation-quality ...). Still we, the authors,
stand behind the idea even if we don't have the energy to maintain
a production-quality implementation.</p>
<p>So, <strong>Dear LazyWeb</strong>, If you are interested in the
topic and you've development cycles to spare, please <a href=
"mailto:zack@upsilon.cc">drop me a mail</a> and I'll be happy to
point you to all needed details to resurrect the implementation (or
create one from scratch, which should be pretty easy and quick if
you're familiar with extension development).</p>
Enforcing type-safe linking using package dependencieshttp://upsilon.cc/~zack/blog/posts/2009/11/Enforcing_type-safe_linking_using_package_dependencies/2009-11-28T12:00:16Z2009-11-28T12:00:16Z
<h1>Eclectic paper: dh-ocaml</h1>
<p>In my day job as a researcher, I mostly <a href=
"http://upsilon.cc/~zack/research/publications/">publish papers</a> along the lines
of my main research interests (theorem proving, web technologies,
formal methods applied to software engineering, ...). Some time
though, I just come up to some <strong>eclectic idea</strong>, not
strictly related to my job, that I feel like cooking up as a paper
to be reviewed by some scientific venue.</p>
<p>It happened some weeks ago with <a href=
"http://packages.debian.org/sid/dh-ocaml"><strong>dh-ocaml</strong></a>,
the package implementing the <strong>new dependency scheme for
OCaml-related packages</strong> in Debian. It took us (as in
<a href="http://wiki.debian.org/Teams/OCamlTaskForce">Debian OCaml
maintainers</a>) several years to get it right and satisfactory for
maintainers, users, release team, etc.</p>
<p>The problem which dh-ocaml addresses is that, differently than C
and other system-level languages, <strong>OCaml breaks ABI
compatibility very often</strong>, due to the need of ensuring type
safety across different libraries at link time. Other similar
strongly typed languages, such as Haskell, behave similarly. This
is at odds with the implicit assumption of forward-compatibility
(unless otherwise "stated", e.g. with soname changes) that is
relied upon by versioned dependencies in distributions like
Debian.</p>
<p>This discussion, the analysis of possible solutions, and the
description of the solution we have actually implemented in
dh-ocaml (called <strong>ABI approximation</strong>) turned out to
be interesting for the French functional programming academic
community: the <a href=
"http://upsilon.cc/~zack/research/publications/jfla10-dh-ocaml.pdf"><strong>paper on
dh-ocaml</strong></a> has been accepted at forthcoming <a href=
"http://jfla.inria.fr/2010/">JFLA 2010</a>.</p>
<p>It is no rocket science <img src="http://upsilon.cc/~zack/smileys/smile.png" alt=
":-)" /> , but people maintaining programs and libraries written in
languages with concerns similar to OCaml's (e.g. Haskell, hello
<a href="http://www.joachim-breitner.de/blog/">nomeata</a>) might
want to have a look.</p>