Keep only last line of a repeated pattern

2014-01-27

An answer to this question on Stack Overflow.

Question

I would like to know if it is possible to delete all the lines of a selected pattern except the last one. It is not so easy to explain, so I will make an example.

I have a text file with content similar to this:

A sent (1)
A received (1)
B sent (1)
B sent (2)
B sent (3)
B received (1)

I would like to have an alternation between "sent" and "received" messages, where the "sent" one is the last between the sent messages with the same letter. So I need an output like:

A sent (1)
A received (1)
B sent (3)
B received (1)

Is there some program that can do something like that? I can use either Ubuntu or Windows, or build a simple C/C++ application, if necessary.

Answer

Here's a simple way:

tac FILE | uniq -w 6 | tac

We:

Reverse-print the file using tac (necessary for uniq to work right here).
Weed out duplicate lines basing uniqueness on only the first 6 characters (thereby ignoring the incrementing number in parantheses). Only the first line of a set of duplicate lines is kept, which is why we have used tac.
Then reverse-print the file again so it's in the order you want.