(MSc) Thesis topics | Sujets de stage | Argomenti di tesi

Below you can find a list of topics available for students interested in pursuing a master thesis—or stages (in France), or tesi di laurea (in Italy)—with myself as supervisor.

If interested, please get in touch with me via email.

01-swh-distributed-object-storage

Titre: Qui veut gagner des milliards (de fichiers source)?

Contexte: projet de recherche de grande envergure ayant comme but la récupération, l'organisation, et l'archivage à très long terme (siècles) de la totalité du logiciel libre publiquement accessible via Internet.

Description: On souhaite concevoir, réaliser et tester en production un système de stockage pour fichiers textuels de petite taille (typiquement: code source) capable de stocker des milliards des fichiers, pour une occupation totale de l'ordre de 100 téraoctets, sur plusieurs noeuds des stockage géographiquement distribués. La synchronisation entre noeuds sera asynchrone, et chaque noeud disposera d'un contrôle automatique d'intégrité capable de réparer les fichiers corrompus (self-healing).

Connaissances souhaitées pour accéder au stage:

  • algorithmique répartie
  • Python
  • PostgreSQL

Établissement d'accueil: Inria Paris

Encadrants:

Status: disponible

02-swh-web-ui

Titre: Naviguer dans le plus grand dépôt Git du monde

Contexte: projet de recherche de grande envergure ayant comme but la récupération, l'organisation, et l'archivage à très long terme (siècles) de la totalité du logiciel libre publiquement accessible via Internet.

Description: On souhaite concevoir une application Web permettant de explorer le contenu d'un dépôt d'un système de contrôle de version à-la Git. Idéalement, le but est similaire à la réalisation d'un clone de l'interface Web de GitHub, avec une subtilité: le dépôt qu'on souhaite explorer est très probablement le plus grand au monde—500 millions de commits, 2 milliards des fichiers, 10 millions d'auteurs—ce qui pose plusieurs défis architecturaux et d'utilisabilité.

Connaissances souhaitées pour accéder au stage:

  • programmation Web
  • interfaces graphiques
  • Python
  • PostgreSQL

Établissement d'accueil: Inria Paris

Encadrants:

Status: disponible

03-swh-forge-crawling

Titre: Construire le web sémantique des projets logiciels libres

Contexte: projet de recherche de grande envergure ayant comme but la récupération, l'organisation, et l'archivage à très long terme (siècles) de la totalité du logiciel libre publiquement accessible via Internet.

Description: Ils existent des millions de projets de logiciels libres, hébergés sur des centaines de plateformes différentes, et souvent dupliqués. Pour naviguer dans ce graphe de projets logiciels, il est important de disposer de métadonnées pertinentes, et plusieurs efforts existent, autour de technologies du Web Sémantique comme DOAP ou schema.org. Le but de ce stage est de collecter les métadonnées existantes, les uniformiser, et les intégrer dans une des plus grandes collections de logiciels libres au monde.

Connaissances souhaitées pour accéder au stage:

  • information retrieval
  • modélisation et représentation des connaissances
  • manipulation de données semi-structurées (HTML, XML, etc.)

Établissement d'accueil: Inria Paris

Encadrants:

Status: disponible

04-debian-checksums-service

Title: Large-scale repository of binary checksum for integrity checks

Description: implement and deploy a public repository of checksum information for the binary packages of the Debian distribution. Design and implement an API for the service that allows to query it for integrity checks and forensic purposes. Inject into the repository a substantial subset of the distribution history; analyze the resulting data set. The service is meant to be queried by the client developed as a ?separate topic.

Technologies:

Supervisors:

Status: available

05-tails-integrity-client

Title: Design and implementation of an integrity/forensic client for the Tails live distribution

Description: build a software tool (to be integrated in Tails) that is able to check the integrity of (some part of) the binaries installed on a Debian PC, accessible from the running Tails instace. As a backend, the client will use either the service developed as a ?separate topic or http://dedup.debian.net/

Technologies:

Supervisors:

Status: available

06-windows-of-vulnerability

Title: Windows of Vulnerability (WoVs)

Description: design and implement a forensic tool capable of reviewing the upgrade history of a Debian(-like) distribution with respect to the history of publicly known software vulnerabilities (e.g., CVEs, NVD, etc). The output of the tool should be a series of time intervals, stating to which vulnerabilities the machine might have been exposed in the past, and for how long it has been the case.

Technologies:

Supervisors:

Status: taken

07-tails-self-dpi

Title: Self Deep Packet Inspection for Tails

Description: instrument the Tails distribution to be able to perform "self" Deep Packet Inspection, to prevent unintended leaks of private information (e.g., IP adress, browser fingerprinting information, etc). It should be possible to use the instrumentation for both distribution development/testing and real use (provided that a suitable UI, which is outside the scope of this work, can be devised).

Technologies:

More information:

Supervisors:

Status: available

08-etherpad-encrypted

Title: Encrypted Etherpad

Description: implement an encryption scheme that is suitable for real-time collaborative editing (a-la Google Docs) and integrate it into Etherpad. The scheme would allow to store the pad content on the server in an encrypted form. All of this has to be done without undermining real-time collaboration, and in particular it should use block-by-block encryption.

Technologies:

More information:

Supervisors:

Status: available

09-etherpad-in-a-box

Title: Etherpad in a box

Description: implement an alternative Etherpad UI as a Firefox Add-on (more specifically a Firefox Extension). No extra features with respect to the Web-only version of Etherpad are planned, but minor modifications might be needed. Ideally, the Firefox-based UI should share as much (JavaScript+HTML5) code with Etherpad as possible, and should aim at being built directly from Etherpad sources.

Technologies:

Supervisors:

Status: available

10-rtce-characterization

Title: Characterization of real-time collaborative editing via Etherpad instrumentation

Description: instrument Etherpad to collect a wire range of live data during real-time collaborative editing of textual documents. A few examples of collected data: cursors position of each user, text and attribute changes, client-server http messages and network segments. The instrumentation should be enough to conduct experiments with real users and characterize usage patterns (yet unknown in the literature) of real-time collaborative editing.

Technologies:

Supervisors:

Status: available

11-functional-simulation

Title: Functional Adaptive Parallel and Distributed Simulation

Description: design and implement a parallel/distributed simulation model based on the Multi-Agent System paradigm using a statically typed, functional programming language (e.g., OCaml).

Technologies:

Supervisors:

Status: available

12-firmware-hw-integrity

Title: Firmware and hardware checksuming for integrity evaluation

Description: design and build a software tool (based on a Linux live distribution) that is able to retrive as much firmware and hardware information as possible about the devices installed on the PC, and checksum them to: a) verifiy if something has changed since the last known run of the tool (to detect tampering smells), and b) compare the obtained results against a (community-maintained) database of "well-known" firmware/hardware information.

Technologies:

Supervisors:

Status: available

13-anon-p2p-cloud

Title: Anonymous peer-to-peer (P2P) cloud

Description: design and implementation of an opt-in, distributed, P2P IaaS cloud, in which both providers and users of virtual machines remain anonymous, thanks to the Tor low-latency anonymity network and its support for hidden services.

Technologies:

More information:

Supervisors:

Status: taken

14-security-debian-derivatives

Title: Propagation of security bug fixes among Debian derivatives

Description: study the propagation delay between the arrival of security bug fixes in the Debian distribution and the arrival of the corresponding fixes in Debian derivatives (i.e., GNU/Linux distributions that are based on/periodically merged with Debian). The work will be conducted processing a large data set of already available diff-s, which have been obtained by automatically comparing Debian with its derivatives on a daily basis over the past few years.

Technologies:

More information:

Supervisors:

Status: available