I have also wasted days with sort -k. I recommend you to read the real GNU documentation ; the Debian info page is unfortunately a fake.

http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html#sort-invocation

In the info page, the behaviour of sort -k is better explained. It is such a tragedy that this crucial information is so hard to obtain. For instance, search engines like Google do not perform well at giving the precedence to the info page.

Anyway, thank you for your blog post, that reminded me of the usefulness of 'join'. I need it at work today. You made my day !

-- Charles Plessy


Hi!

Note that $ sort -u -k1,1 file >x is not the same as $ sort -k1,1 file | uniq >x

The former syntax is pretty dangerous: it collapses all entries with the same sort key(!) == first column into one (I think it just throws away all but the first, or the last, don't remember, it made no difference when I last used it for my case - searching for files with the same hash that are not (yet) hardlinked to each other).

Of course, GNU sort may not behave so, but MirBSD sort does.

-- mirabilos


The former syntax is pretty dangerous: it collapses all entries with the same sort key(!) == first column into one.

Yes, I'm aware of that. In fact in my case it doesn't make any difference, since my sort key is also a unique key. But you're right: it should be pointed out.

-- zack