Hacking the PTS using SOAP

I've finally found some time, thanks to the Extremadura QA + ftpmaster + i18n meeting, to release the first draft of the SOAP interface to the PTS.

You probably already got the idea, which is quite simple after all. The PTS, as always, gathers information about (source) packages from various sources and melds them together into web pages. With a SOAP interface you just gain the ability of accessing such information from your programs via SOAP.

A proof of concept is overdue:

    $ cat ./test.py 
    import SOAPpy
    url = 'http://packages.qa.debian.org/cgi-bin/soap-alpha.cgi'
    ws = SOAPpy.SOAPProxy(url)
    print ws.versions(source="ocaml")['unstable']
    print ws.uploaders(source="ocaml")[1]['name']

    $ ./test.py 
    Stefano Zacchiroli

Everything is still in alpha version, but already working. Some links which you might find useful:

Please let me know if / how you are using of the SOAP interface, it will help for future developments.

How it works

Just a few comments on how it works. You might remember that a while ago I've made all PTS pages XHTML-valid. Well, on top of it I've implemented something along the lines microformats, that just make a clever use of ingredients already available in XHTML like classes and unique identifiers.

Having that, a "reshuffling" of the information already available on the web pages (which are now kinda "semantically" tagged) can be obtained by evaluating a handful of XPaths on the (not anymore) final XHTML pages. That's precisely what the CGI implementing the SOAP API is currently doing. This way one can avoid implementing two different access paths to the information collected by the PTS: one for rendering, and one for SOAP (no, reusing the rendering one for SOAP was not an option, given that it was originally written in XSLT).

The only annoyance I've encountered is that XPath is completely unaware of the "CSS-like" semantics of XHTML classes, which states that classes are space-separated list of class names, to be interpreted as sets. That means that to check whether an element belongs to a given class you need to fiddle with substring matches on the class attribute (which is quite crappy).