a tale of three tools: mairix, maildir-utils (mu), nmzmail

I think/fear I'm getting into this Getting Things Done thingie. For weird reasons I'll explain later on, part of the GTD work flow I'm implementing requires quick lookup from Message-IDs to the corresponding mail, no matter in which mailbox (actually Maildir) I've stored it.

Hence, I've looked for a mail indexing tool which is Mutt-compatible, handles Maildirs, and supports Message-ID queries. In Debian (where else should I look? g), I found three: mairix, nmzmail, maildir-utils (whose upstream name is actually "mu").

The first one I tried is mairix. Last upload in Debian was 2 years ago, it doesn't seem to be particularly buggy, and in popcon it has about 300 installations. The integration with mutt is good: searches can create a sort of virtual Maildir, whose files are symlink to the search results; with a couple of macros you can have Mutt easily open the result directory after query. The reason why I ditched mairix, is that it heavily suffer from the NIH syndrome. mairix is a self-contained executable with no external dependencies; that in principle being good, I found nowadays totally unreasonable to not use some third party full text search indexer, given that in the FOSS world we have several good ones. A good aspect of mairix, which is missing in the competitors, is the ability to index messages incrementally as they flow in, e.g. via procmail. Still, the problem of that is that it is difficult to pair that with the habit of moving messages across mailboxes. To that end, periodic re-indexing, or better batch index updates, offers a better work flow.

Then I tried nmzmail, which claims at is killer feature the "better integration with Mutt" wrt competitors. Actually, this is false, it has the same level of integration of the other (virtual maildirs with macros) and it doesn't even offer a ready to use set of macros in the doc! (yeah, they're easy to write, but given that you claim you're so well integrated with Mutt ...). The reason why I ditched nmzmail is that I didn't particularly like its choice of (external at least) indexer: Namazu. The index it created was very big (something like 250 Mb for about 400 Mb of Maildirs). Also, I had a bad feeling that the indexing was somehow Japanese-specific (the project having support for that language) and I found no way to disable support for that, which I obviously do not need.

Finally, I tried maildir-utils and I was finally happy. It is implemented on top of Sqlite3 (for mail metadata) and Xapian (for full-text indexing). There are some bugs, but Norbert, as the very reactive maintainer, have tackled down most of them now and I've been happy to help with various feedback. Integration with Mutt is granted by the following 2 macros:

    macro index <F8> "<shell-escape>rm -rf ~/.mu/results; mu-find -o l -l ~/.mu/results " "mu-find"
    macro index <F9> "<change-folder-readonly>~/.mu/results\n" "display mu-find results"

The first one query (you) for the search string, the second jumps to the results opening the virtual Maildir (which is useful, especially when you want to go back to the last query you did). I update the index every two hour with the following cron entry:

    31  */2 *  *   *     on_ac_power && mu-index -q

Beside a corner case bug which is close to solution, updating the index is very fast, usually a few seconds; index size is about 150 Mb. To conclude, my initial goal (message path lookup via Message-ID) is easily achieved as follows:

    $ mu-find -f P m:20091030112543.GA4230@usha.takhisis.invalid
    /home/zack/Maildir/INBOX/cur/1256902638_0.25702.usha,U=37563,FMD5=7e33429f656f1e6e9d79b29c3f82c57e:2,S

In addition to that, I've now gained all-maildirs full-text search from within Mutt :-)

As a concluding remark, Enrico pointed me to Not much mail, but it seems to be (by authors' own admission) in early stage of development. Also, AFAIU it aims to be a MUA, whereas I'm perfectly fine with Mutt, I just need from time to time to integrate it with other components of my daily work flow.

Update: (30/01/2011) I've now tried Notmuch and switched to it without looking back. I've written about it in a separate post, including tips on how to integrate Notmuch with Mutt.

Why folders?
If I might suggest, the problem to me sounds like the use of folders in the first place. I have two: =Archive and =INBOX. Search gives me everything else I need. Filing mailing list messages into folders proves overrated; just let them hit your inbox, and throw them into =Archive.
Comment by Anonymous Sat 31 Oct 2009 07:52:52 PM CET
extremely useful

Well,

thank you so much for blogging on this solution. I implemented it right away on my Debian GNU/Linux box and it worked so far.

Thanks for such a clear writing style.

Ciao !

Filippo

Comment by Filippo Sat 31 Oct 2009 08:10:09 PM CET
Thanks
Thanks for this roundup. I've been using Mairix for a number of years now, but no longer use mutt as my primary reader. I've been thinking about better options, and having something Xapian-based certainly has an appeal.
Comment by John Goerzen Sat 31 Oct 2009 09:10:39 PM CET
How about a news reader going nuts ? Or is it me ? ;)

Sorry for the triplicate, Stefano ! Ciao, Filippo

Comment by Filippo Sat 31 Oct 2009 09:36:27 PM CET
comment 7
Thanks so much for this write up. Very informative and useful. I had been using nmzmail but was very unhappy with it. I didn't know about mu. I've now switched. Thanks again.
Comment by drgraefy Sat 31 Oct 2009 11:01:23 PM CET
Re: Why folders?

If I might suggest, the problem to me sounds like the use of folders in the first place. I have two: =Archive and =INBOX. Search gives me everything else I need. Filing mailing list messages into folders proves overrated; just let them hit your inbox, and throw them into =Archive.

Actually, only one specific problem comes from the use of folders, namely that you need to repeat your search in several places. Mail indexing addresses that, but not only that, for instance it also solves the speed problem: searching through the 5000 mails I've in my "Archive" with ~b takes several seconds, mu is way faster than that.

On why I do use mailboxes and not a work flow like yours is that I filter incoming messages in different folders so that, e.g., during day work I'm not distracted by Debian traffic unless I explicitly jump to one of my Debian-related mailboxes.

Comment by zack Sun 01 Nov 2009 02:24:23 PM CET
Re: thanks / useful

Thank you guys for the feedback.

.. and, Filippo, don't worry about the triplicate comment, I've removed the duplicates.

Cheers.

Comment by zack Sun 01 Nov 2009 02:53:11 PM CET
Experience with Mairix
I have been using Mairix since 2006 and I'm fairly happy with it. It is very fast on my maildir tree of several GB. I like the way it presents results as a maildir full of symlinks which I can use from any mail reader or IMAP client. Mairix does incremental updates - they are fast too. The only major gripe I have against Mairix is that it does not handle accented characters - as a French speakers, I am not fond of restricting my searches to only words with no accents. Maybe I'll give maildir-utils a try, but for now I'm quite comfy with Mairix.
Comment by Jean-Marc Liotier Mon 02 Nov 2009 12:40:58 PM CET
re: Experience with Mairix

The only major gripe I have against Mairix is that it does not handle accented characters

FWIW, Xapian (and hence maildir-utils) handles properly accented characters (I just had fun looking for all occurrences of "perchè" which is the wrongly accented translation of "why" in Italian).

I believe this is an example of why tools like Mairix should be ditched, and I'm not referring to accented character support. The problem here is that Mairix has reinvented a wheel called "full text indexer". If you now want accented characters you need to add support for them in Mairix, whereas maildir-utils (and nmzmail FWIW) can rely on support for that on the external full text indexers they use.

Comment by zack Mon 02 Nov 2009 04:57:22 PM CET
Interesting

Interesting, thanks. I've been looking for something like this. Some features would make it an instant win for me, but it seems like it doesn't have these: - threading ("here's a message, find me everything in its thread") - arbitrary headers (e.g., "find me all messages with 'foo' in the X-Label line")

Comment by Jim Mon 02 Nov 2009 06:14:46 PM CET
Same research, I selected mairix

I did a similar research recently and ended up using mairix. Someone also suggested swish-e but it's not really suitable IMO. I was not aware of the other alternatives, nice to discover them. I wish one would use inotify to update the index in real time.

Incremental indexing is supported in mairy and I run it on cron every 10 minutes (mairix -F) and once a day I do a run with prune too (mairix -p). The only problem I have with it, is that it complains on some headers that it can't parse. Some of the problems are apparently fixed in the upstream git repo but it's not very active and I wonder how much future it has.

I have a small shell script to start me mutt in the right folder on the desired message, it simply takes a Message-ID as its only parameter.

#!/bin/sh
id=${1#<}
id=${id%>}
if [ -z "$id" ]; then
    echo "Usage: $0 <message-id>"
    exit 127
fi
for file in $(mairix -r "m:$id"); do
    mdir=$(dirname $(dirname $file))
    mutt -f ~/$mdir -e "push \"/~i $id<ENTER><ENTER>\""
    exit 0
done
echo "mairix did not find a message with message-id <$id>..." >&2
exit 1

Funnily I also did this for setting up my own GTD system, I want to keep track of messages that I should get back to without keeping them im my INBOX, I record the message id in my usual TODO list, and can quickly find the message again thanks to the indexing.

Comment by Raphael Hertzog Mon 02 Nov 2009 11:26:13 PM CET
@Raphael

Eh, looks like we're doing pretty much the same thing then :-) I too have developed my own script similar to yours: ?mutt-open .

It looks a bit more flexible than yours only because it supports both accessing a message by Message-ID and by full path pointing inside a maildir.

As I use [[!debpkgsid mutt-patched]], it also shut down the sidebar, but that's mostly irrelevant.

BTW, in my GTD implementation, I also have now an integration between Mutt and Emacs' Org mode that enables me to put from Org notes links to Mutt emails. I'll blog about that separately, but if you are interested I've already posted the relevant code to the Org mode mailing list.

Comment by zack Tue 03 Nov 2009 09:12:42 AM CET
containing Maildir for a given search-results message?

Thanks for the research.

Have you come up with a good way to have the mutt-integrated maildir-utils tools display the Maildir containing a given search-result message? I am enjoying my newfound ready access to various search hits, but all too often, I'd like to revisit the containing thread containing that particular message, and I have to resort to using a separate shell and "mu find" to display the Maildirs for the hits on the same query.

The only thing I can think of is to patch "mu find" to, instead of generating symlinks, create copies of the messages, and rewrite the subject lines with the name of the Maildir (or whatever other metadata users may fancy).

Comment by Rob Mon 01 Mar 2010 08:00:40 PM CET
re: containing Maildir for a given search-results message?

Have you come up with a good way to have the mutt-integrated maildir-utils tools display the Maildir containing a given search-result message? I am enjoying my newfound ready access to various search hits, but all too often, I'd like to revisit the containing thread containing that particular message, and I have to resort to using a separate shell and "mu find" to display the Maildirs for the hits on the same query. The only thing I can think of is to patch "mu find" to, instead of generating symlinks, create copies of the messages, and rewrite the subject lines with the name of the Maildir (or whatever other metadata users may fancy).

Yes and no. For "normal" search activities I still use the usual mu "result" mailbox, with the threading problem you've mentioned. Note that there is a wishlist bug report that asks for threading support in mu. Most likely it will be coming in future versions, solving the problem properly.

To lookup a specific message in its original context I use a script called mutt-open which I've posted as attachment to a separate blog post. Using that the lookup process works in three steps: first you find the path of the message you're interested in, then you launch mutt on its containing maildir, and finally you jump (sending keys to mutt) to the message corresponding to the message id you previously found. It works, but it is ideally hackish since you basically do the lookup twice.

Also note that the notion of "original context" is not necessarily well-defined, as you might have a message with the same message id occurring in several maildirs; in that case my mutt-open script arbitrarily choose one among all occurrences (the first returned by "mu find", that is).

Thanks for your feedback.

Comment by zack Tue 02 Mar 2010 10:38:25 AM CET